Re: Lucene 4.2 DocValues

Arun Kumar K Tue, 28 May 2013 20:06:02 -0700

Adrein,

Thanks for spending time to explain me the things clearly. I have got the 
things correctly now.


Thanks,
Arun


On 29-May-2013, at 2:13 AM, Adrien Grand <jpou...@gmail.com> wrote:

> On Tue, May 28, 2013 at 8:55 PM, Arun Kumar K <arunk...@gmail.com> wrote:
>> Thanks for clarifying the things.
>> I have some doubts regarding sorting :
>>> 
>>> While you can do that, I don't recommend it. For example, if you have
>>> 5 fields, loading all fields from stored fields requires at most 1
>>> disk seek while loading all fields from doc values requires at least 5
>>> disk seeks for disk-based doc values.
>> 
>> 
>> 1> I am assuming those mentioned 5 fields are sortable fields upon which 
>> sorting is done.
>> In my understanding, loading stored fields takes 1 disk seek for finding 
>> file pointer & 1 disk seek for getting all those fields.
> 
> This was correct until Lucene 4.0, but since 4.1, Lucene stores the
> doc ID -> file pointer mapping in memory, ensuring at most 1 disk
> seek.
> 
>> Since different file is maintained for a particular doc value field. We get 
>> 5 disk seeks + 1 disk seek for file pointer.
> 
> There is no general rule since this depends on the doc values type and
> the codec implementation, but you got the idea.
> 
>> If we have only one sortable field , which could be better ? I guess no diff.
> 
> Just to make things clear, before Lucene had doc values, sorting was
> performed based on the inverted index (which was uninverted and stored
> in memory using FieldCache), not stored fields. Stored fields are bad
> for sorting because they are usually large and don't play nice with
> the file system cache.
> 
> Doc values are very similar to FieldCache except that the hard work is
> done at indexing time instead of searching time. This is good
> trade-off because it allows for faster loading of indexes and for
> off-loading data to disk. This is never a bad idea to use doc values
> for sorting.
> 
>> Also, I vaguely remember that there is some performance loss for sorting 
>> based on string in lucene 4.0
>> Then, will the decision change for String field or based on type of field ?
> 
> I don't see why String sorting would be slower. However, it is true
> that String sorting requires a lot of memory. If your field is a
> number, you should definitely use a numeric field cache.
> 
>> 2> Also, In my understanding, if we need to use parser based queries for 
>> docvalues, we need to have a storedfield for a doc with same name & value of 
>> the doc's docvalue.
>> Even term queries won't work. Am i right here?
> 
> QueryParser is completely unaware of your schema. If you want
> QueryParser to use doc-values-based queries, you can override
> QueryParser.newRangeQuery and/or QueryParser.newFieldQuery to return a
> new ConstantScoreQuery that wraps a FieldCacheRangeFilter.
> 
> --
> Adrien
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Lucene 4.2 DocValues

Reply via email to