Re: Doubts related to Apache Blur

Garrett Barton Thu, 21 Nov 2013 10:56:24 -0800

Naresh,

I understand your problem set better now.


As far as a data structure I would define the following fields within a
family called data:

data.measure (Type String, not fieldLessIndexed )
data.period (Type String, not fieldLessIndexed )
data.pool1..n (Type String, not fieldLessIndexed )
data.tags (Type String)
data.cost (Type Long,not fieldLessIndexed )

If you were in the Blur shell you would do something like:
create -t myTable -c 4
definecolumn myTable data measure String
definecolumn myTable data period String
definecolumn myTable data tags String
definecolumn myTable data cost Long

This will let you create as many pool columns as you want, and when you
retrieve the row you will get the titles back by virtue of the column
names. When you query against a tag, you would query against tags field
where you have also loaded your pool data into.
So rewriting your queries into a working Blur query (assuming your family
is called 'data', i dont know exactly what your working on so I'm sure you
could come up with a better name) would look like:

   Query 1 : I need get all rows with
            data.measure:Cost AND data.period:Nov13 AND data.tags:Tag1
            O/P = Row1, Row3
   Query 2: get all rows with
            data.measure:Cost AND data.period:Dec13 AND data.tags:Tag1 AND
data.tags:TagA       O/P = Row4, Row5


As far as getting to work in windows I wouldn't wait for that to happen too
soon.  If you download any favorite linux distro live install, install
virtualbox, and download the latest release of Blur you could be running in
under an hour (depending on bandwidth).

I will reply to the JIRA ticket about Hadoop 2.x with my mods soon,
hopefully with a patch to make things work.

Take it easy,
~Garrett



On Thu, Nov 21, 2013 at 12:48 PM, Naresh Yadav <[email protected]> wrote:

> Hi
> ,
> Thanks much Garrett for guiding me, that was really helpful..
>
> For Doubt *1* i will definitely need your help once i start trying
> installation..Please share document on this if possible.
>
> For Doubt *2* i think will be able to manage with VM, will explore that, it
> would have been better for me if somebody already installed on windows by
> making bat files so that i can also reuse that.
>
> For Doubt *3* my actual case is like this (assume these as rows in excel
> sheet that is how my data will be) :
>
> Row1 : Measure=Cost, Period=Nov13, Pool1=Tag1, Pool2=TagA,  Cost=50
> Row2 : Measure=Cost, Period=Nov13, Pool1=Tag2, Pool2=TagB , Cost=20
> Row3 : Measure=Cost, Period=Nov13, Pool1=Tag1, Cost=20
> Row4 : Measure=Cost, Period=Dec13, Pool1=Tag1, Pool2=TagA, Pool3=TagP,
> Cost=150
> Row5 : Measure=Cost, Period=Dec13, Pool1=Tag1, Pool2=TagA, Pool4=TagQ,
> Cost=170
> Row6 : Measure=Cost, Period=Dec13, Pool5=Tag1, Cost=120
>
>    Query 1 : I need get all rows with
>             Measure:Cost, Period:Nov13, Tag1                 O/P = Row1,
> Row3
>    Query 2: get all rows with
>             Measure:Cost, Period:Dec13, Tag1, TagA       O/P = Row4, Row5
> So challenge for me is Tag parts as there are varying with rows and also
> while querying on them i will not have
> knowledge of their column/pool names just N tags i can have in any row...
>
> Will such querying will be supported OR Suggest better data model  of
> storage of this case.
>
> Naresh
>
> On Thu, Nov 21, 2013 at 8:42 PM, Garrett Barton <[email protected]
> >wrote:
>
> > Welcome aboard!
> >
> > I can answer a few:
> >
> > 1. Yes with some build flags and script tweaking I can help with. I am
> > running it now.
> >
> > 2. You will have to make startup scripts for windows, and honestly I
> could
> > not tell you if Blur would even run in a windows environment.  Have you
> > considered doing dev in a VM? Or running a VM on your windows machine at
> > least for hosting the hadoop stack?
> >
> > 3. Are you familiar with lucene itself?  You must query against a column
> > (ok not 100% true with blur but it seems like you have specified field1=x
> > field2=y requirements) I am slightly confused with your queries as they
> > have a mix of column names and values that are in different columns in
> your
> > example.
> > Assuming your first query is cost:50 AND period:Nov13 AND pool1:Tag1 then
> > sure. If you meant any kind of cost, then you simple omit that from the
> > query in the first place.
> > Assuming your second query is (cost:50 OR cost:150) AND period:Dec13 AND
> > pool1:Tag1 AND pool2:Tag2 then sure that works too.
> >
> > For the most part, if you can write a pretty standard SQL statement to
> > query for your data as if it was in a database, that can be duplicated
> > inside Blur.
> >
> >
> > Millions of rows will be fine.  A single table with the column names you
> > have described is fine, you will have to come up with some kind of unique
> > identifier for each row to load into Blur. (Like a primary key in a
> > database)
> >
> > Let me know if you have any more questions. :)
> >
> > ~Garrett
> >
> >
> > On Thu, Nov 21, 2013 at 5:38 AM, Naresh Yadav <[email protected]>
> > wrote:
> >
> > > hi,
> > >
> > > I am just reading about Apache Blur from last one day..and i found it
> > > perfect fit for my project. But i have some doubts :
> > >
> > > 1. Will i be able to Hadoop 2.0 existing cluster with Apache Blur
> latest
> > > version
> > >
> > > 2. My development enviornment is Windows and Hadoop 2.0 supports
> windows
> > > so   i have doubt will apache blur latest version will work on windows
> > > smoothly..i will get startup scripts for windows.
> > >
> > > 3. Here is 4 rows of my data which i need to store in one table :
> > >        Cost=50, Period=Nov13, Pool1=Tag1, Pool2=Tag2
> > >        Cost=50, Period=Nov13, Pool1=Tag1, Pool2=Tag3
> > >        Cost=150, Period=Dec13, Pool1=Tag1, Pool2=Tag2, Pool3=Tag3
> > >        Cost=150, Period=Dec13, Pool1=Tag1, Pool2=Tag2, Pool3=Tag4
> > >
> > >    Query 1 : I need get all rows with
> > >              Cost, Nov13, Tag1
> > >    Query 2: get all rows with Cost, Dec13, Tag1, Tag2
> > >      Will i be able to do perform such query if yes how should i design
> > > this Blur table for this use case. Note : In this table there can be
> > > million of rows with all historic data.
> > >
> > > Please help me, i am new to big data technologies..Your guidance will
> > give
> > > me direction to proceed..
> > >
> > > Thanks
> > > Naresh
> > >
> >
>

Re: Doubts related to Apache Blur

Reply via email to