Re: Doubts related to Apache Blur

Garrett Barton Tue, 26 Nov 2013 08:30:38 -0800

Yea you do need to compile blur with hadoop 2.x. To do that switch to the
blur 0.2 branch and fire this maven command from the root blur dir:


mvn clean package install -P\!hadoop-1x,cdh4-mr1
-Dhadoop.version=2.0.0-mr1-cdh4.3.0 -DskipTests=true

Should compile fine.

The other changes are in shell scripts which you would have to recreate in
the bat files.  Since hadoop 2.x is split into multiple dirs, hadoop, hdfs,
mr, and conf, I basically added to the existing required hadoop_home env
var with 3 additional optional ones for HDFS_HOME, MAPRED_HOME,
HADOOP_CONF.  I also manually copied ant-1.6.5.jar into blur's lib folder.
That is all that is required to make the thing work.  See this JIRA:
https://issues.apache.org/jira/browse/BLUR-313

Good luck with windows,
~Garrett


On Tue, Nov 26, 2013 at 9:02 AM, Naresh Yadav <[email protected]> wrote:

> *Garrett,*
> I was able to start blur servers with hadoop1.2.1 but facing problem to
> doing maven with haddop2.2.0 dependency
>
> Please help me with blur and hadoop 2.0 problems....So my hadoop 2.0 is up
> and running....
> Now i done
>
> git clone https://git-wip-us.apache.org/repos/asf/incubator-blur.git
>
> Then in pon.xml i changed <hadoop.version>1.2.1</hadoop.version>  to
> <hadoop.version>2.2.0</hadoop.version>
>
> then i run
>
> mvn install -DskipTests -P distribution
>
> It is giving Error as
>
> [ERROR] Failed to execute goal on project blur-util: Could not resolve
> dependencies for project
> org.apache.blur:blur-util:jar:0.3.0-incubating-SNAPSHOT: Could not find
> artifact org.apache.hadoop:hadoop-core:jar:2.2.0 in libdir
> (file://D:\blursrc\incubator-blur-hadoop2..2.0\blur-util/../lib) -> [Help
> 1]
> [ERROR]
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR]
> [ERROR] For more information about the errors and possible solutions,
> please read the following articles:
> [ERROR] [Help 1]
>
> http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException
> [ERROR]
> [ERROR] After correcting the problems, you can resume the build with the
> command
> [ERROR]   mvn <goals> -rf :blur-util
>
> Please send me all required changes for this to success...I am assuming to
> use hadoop2.0 i would need to complie blur code
> also with hadoop2.2.0 jars...
>
> NARESH
>
> On Thu, Nov 21, 2013 at 8:42 PM, Garrett Barton <[email protected]
> >wrote:
>
> > Welcome aboard!
> >
> > I can answer a few:
> >
> > 1. Yes with some build flags and script tweaking I can help with. I am
> > running it now.
> >
> > 2. You will have to make startup scripts for windows, and honestly I
> could
> > not tell you if Blur would even run in a windows environment.  Have you
> > considered doing dev in a VM? Or running a VM on your windows machine at
> > least for hosting the hadoop stack?
> >
> > 3. Are you familiar with lucene itself?  You must query against a column
> > (ok not 100% true with blur but it seems like you have specified field1=x
> > field2=y requirements) I am slightly confused with your queries as they
> > have a mix of column names and values that are in different columns in
> your
> > example.
> > Assuming your first query is cost:50 AND period:Nov13 AND pool1:Tag1 then
> > sure. If you meant any kind of cost, then you simple omit that from the
> > query in the first place.
> > Assuming your second query is (cost:50 OR cost:150) AND period:Dec13 AND
> > pool1:Tag1 AND pool2:Tag2 then sure that works too.
> >
> > For the most part, if you can write a pretty standard SQL statement to
> > query for your data as if it was in a database, that can be duplicated
> > inside Blur.
> >
> >
> > Millions of rows will be fine.  A single table with the column names you
> > have described is fine, you will have to come up with some kind of unique
> > identifier for each row to load into Blur. (Like a primary key in a
> > database)
> >
> > Let me know if you have any more questions. :)
> >
> > ~Garrett
> >
> >
> > On Thu, Nov 21, 2013 at 5:38 AM, Naresh Yadav <[email protected]>
> > wrote:
> >
> > > hi,
> > >
> > > I am just reading about Apache Blur from last one day..and i found it
> > > perfect fit for my project. But i have some doubts :
> > >
> > > 1. Will i be able to Hadoop 2.0 existing cluster with Apache Blur
> latest
> > > version
> > >
> > > 2. My development enviornment is Windows and Hadoop 2.0 supports
> windows
> > > so   i have doubt will apache blur latest version will work on windows
> > > smoothly..i will get startup scripts for windows.
> > >
> > > 3. Here is 4 rows of my data which i need to store in one table :
> > >        Cost=50, Period=Nov13, Pool1=Tag1, Pool2=Tag2
> > >        Cost=50, Period=Nov13, Pool1=Tag1, Pool2=Tag3
> > >        Cost=150, Period=Dec13, Pool1=Tag1, Pool2=Tag2, Pool3=Tag3
> > >        Cost=150, Period=Dec13, Pool1=Tag1, Pool2=Tag2, Pool3=Tag4
> > >
> > >    Query 1 : I need get all rows with
> > >              Cost, Nov13, Tag1
> > >    Query 2: get all rows with Cost, Dec13, Tag1, Tag2
> > >      Will i be able to do perform such query if yes how should i design
> > > this Blur table for this use case. Note : In this table there can be
> > > million of rows with all historic data.
> > >
> > > Please help me, i am new to big data technologies..Your guidance will
> > give
> > > me direction to proceed..
> > >
> > > Thanks
> > > Naresh
> > >
> >
>

Re: Doubts related to Apache Blur

Reply via email to