You can control the fields to warmup via: http://incubator.apache.org/blur/docs/0.2.0/Blur.html#Struct_TableDescriptor
The preCacheCols field. The comment is wrong however, so I will create a task to correct. The use of the field is: "family.column" just like you would search. Aaron On Tue, Oct 1, 2013 at 12:41 PM, Ravikumar Govindarajan < [email protected]> wrote: > Thanks Aaron > > General sampling and warming is fine and the code is really concise and > clear. > > The act of reading > brings the data into the block cache and the result is that the index is > "hot". > > Will all the terms of a field be read and brought into the cache? If so, > then it has an obvious implication to avoid fields like, say > attachment-data from warming up, provided queries don't often include such > fields > > > On Tue, Oct 1, 2013 at 7:58 PM, Aaron McCurry <[email protected]> wrote: > > > Take a look at this package. > > > > > > > https://git-wip-us.apache.org/repos/asf?p=incubator-blur.git;a=tree;f=blur-store/src/main/java/org/apache/blur/lucene/warmup;h=f4239b1947965dc7fe8218eaa16e3f39ecffdda0;hb=apache-blur-0.2 > > > > Basically when the warmup process starts (which is asynchronous to the > rest > > of the application) it flips a thread local switch to allow for tracing > of > > the file accesses. The sampler will sample each of the fields in each > > segment and create a sample file that attempts to detect the boundaries > of > > each field within each file within each segment. Then it stores the > sample > > info into the directory beside each segment (so that way it doesn't have > to > > re-sample the segment). After the sampling is complete or loaded, the > > warmup just reads the binary data from each file. The act of reading > > brings the data into the block cache and the result is that the index is > > "hot". > > > > Hope this helps. > > > > Aaron > > > > > > > > > > On Tue, Oct 1, 2013 at 10:09 AM, Ravikumar Govindarajan < > > [email protected]> wrote: > > > > > As I understand, > > > > > > Lucene will store the files in following way per-segment > > > > > > TIM file > > > Field1 ---> Some byte[] > > > Field2 ---> Some byte[] > > > > > > TIP file > > > Field1 ---> Some byte[] > > > Field2 ---> Some byte[] > > > > > > > > > Blur will "sample" this lucene-file in the following way > > > > > > Field1 --> <TIM, start-offset>, <TIP, start-offset>, ... > > > > > > Field 2 --> <TIM, start-offset>, <TIP, start-offset>, ... > > > > > > Is my understanding correct? > > > > > > How does Blur warm-up the fields, when it does not know the > "end-offset" > > or > > > the "length" for each field to warm. > > > > > > Will it by default read all Terms of a field? > > > > > > -- > > > Ravi > > > > > >
