Re: Index Warmup in Blur

Aaron McCurry Thu, 03 Oct 2013 05:07:00 -0700

The biggest issue with this is that the shards (the indexes) inside of Blur
actually move from one server to another.  So to support this behavior all
the indexes are stored in HDFS.  Do due the differences between HDFS and
the a normal POSIX file system, I highly doubt that the BDB file form in
TokyoCabinet can ever be supported.


If you really need partial document updates, there would need to be changes
throughout the entire stack.  I am curious why you need this feature?  Do
you have that many updates to the index?  What is the update frequency?
 I'm just curious of what kind of performance you get out of a setup like
that?  Since I haven't ever run such a setup I have no idea how to compare
that kind of system to a base Lucene setup.

Could you point be to some code or documentation?  I would to go and take a
look.

Thanks,
Aaron



On Thu, Oct 3, 2013 at 7:00 AM, Ravikumar Govindarajan <
[email protected]> wrote:

> One more help.
>
> We also maintain a file by name "BDB", just like the "Sample" file for
> tracing used by Blur.
>
> This "BDB" file pertains to TokyoCabinet and is used purely for supporting
> partial updates to a document.
> All operations on this file rely on local file-paths only, through the use
> of native code.
> Currently, all update requests are local to the index files and it becomes
> trivial to support.
>
> Any pointers on how to take this forward in Blur set-up of shard-servers &
> controllers?
>
> --
> Ravi
>
>
> On Tue, Oct 1, 2013 at 10:15 PM, Aaron McCurry <[email protected]> wrote:
>
> > You can control the fields to warmup via:
> >
> >
> >
> http://incubator.apache.org/blur/docs/0.2.0/Blur.html#Struct_TableDescriptor
> >
> > The preCacheCols field.  The comment is wrong however, so I will create a
> > task to correct.  The use of the field is: "family.column" just like you
> > would search.
> >
> > Aaron
> >
> >
> > On Tue, Oct 1, 2013 at 12:41 PM, Ravikumar Govindarajan <
> > [email protected]> wrote:
> >
> > > Thanks Aaron
> > >
> > > General sampling and warming is fine and the code is really concise and
> > > clear.
> > >
> > >  The act of reading
> > > brings the data into the block cache and the result is that the index
> is
> > > "hot".
> > >
> > > Will all the terms of a field be read and brought into the cache? If
> so,
> > > then it has an obvious implication to avoid fields like, say
> > > attachment-data from warming up, provided queries don't often include
> > such
> > > fields
> > >
> > >
> > > On Tue, Oct 1, 2013 at 7:58 PM, Aaron McCurry <[email protected]>
> > wrote:
> > >
> > > > Take a look at this package.
> > > >
> > > >
> > > >
> > >
> >
> https://git-wip-us.apache.org/repos/asf?p=incubator-blur.git;a=tree;f=blur-store/src/main/java/org/apache/blur/lucene/warmup;h=f4239b1947965dc7fe8218eaa16e3f39ecffdda0;hb=apache-blur-0.2
> > > >
> > > > Basically when the warmup process starts (which is asynchronous to
> the
> > > rest
> > > > of the application) it flips a thread local switch to allow for
> tracing
> > > of
> > > > the file accesses.  The sampler will sample each of the fields in
> each
> > > > segment and create a sample file that attempts to detect the
> boundaries
> > > of
> > > > each field within each file within each segment.  Then it stores the
> > > sample
> > > > info into the directory beside each segment (so that way it doesn't
> > have
> > > to
> > > > re-sample the segment).  After the sampling is complete or loaded,
> the
> > > > warmup just reads the binary data from each file.  The act of reading
> > > > brings the data into the block cache and the result is that the index
> > is
> > > > "hot".
> > > >
> > > > Hope this helps.
> > > >
> > > > Aaron
> > > >
> > > >
> > > >
> > > >
> > > > On Tue, Oct 1, 2013 at 10:09 AM, Ravikumar Govindarajan <
> > > > [email protected]> wrote:
> > > >
> > > > > As I understand,
> > > > >
> > > > > Lucene will store the files in following way per-segment
> > > > >
> > > > > TIM file
> > > > >      Field1 ---> Some byte[]
> > > > >      Field2 ---> Some byte[]
> > > > >
> > > > > TIP file
> > > > >      Field1 ---> Some byte[]
> > > > >      Field2 ---> Some byte[]
> > > > >
> > > > >
> > > > > Blur will "sample" this lucene-file in the following way
> > > > >
> > > > > Field1 --> <TIM, start-offset>, <TIP, start-offset>, ...
> > > > >
> > > > > Field 2 --> <TIM, start-offset>, <TIP, start-offset>, ...
> > > > >
> > > > > Is my understanding correct?
> > > > >
> > > > > How does Blur warm-up the fields, when it does not know the
> > > "end-offset"
> > > > or
> > > > > the "length" for each field to warm.
> > > > >
> > > > > Will it by default read all Terms of a field?
> > > > >
> > > > > --
> > > > > Ravi
> > > > >
> > > >
> > >
> >
>

Re: Index Warmup in Blur

Reply via email to