Adrein, Thanks for spending time to explain me the things clearly. I have got the things correctly now.
Thanks, Arun On 29-May-2013, at 2:13 AM, Adrien Grand <jpou...@gmail.com> wrote: > On Tue, May 28, 2013 at 8:55 PM, Arun Kumar K <arunk...@gmail.com> wrote: >> Thanks for clarifying the things. >> I have some doubts regarding sorting : >>> >>> While you can do that, I don't recommend it. For example, if you have >>> 5 fields, loading all fields from stored fields requires at most 1 >>> disk seek while loading all fields from doc values requires at least 5 >>> disk seeks for disk-based doc values. >> >> >> 1> I am assuming those mentioned 5 fields are sortable fields upon which >> sorting is done. >> In my understanding, loading stored fields takes 1 disk seek for finding >> file pointer & 1 disk seek for getting all those fields. > > This was correct until Lucene 4.0, but since 4.1, Lucene stores the > doc ID -> file pointer mapping in memory, ensuring at most 1 disk > seek. > >> Since different file is maintained for a particular doc value field. We get >> 5 disk seeks + 1 disk seek for file pointer. > > There is no general rule since this depends on the doc values type and > the codec implementation, but you got the idea. > >> If we have only one sortable field , which could be better ? I guess no diff. > > Just to make things clear, before Lucene had doc values, sorting was > performed based on the inverted index (which was uninverted and stored > in memory using FieldCache), not stored fields. Stored fields are bad > for sorting because they are usually large and don't play nice with > the file system cache. > > Doc values are very similar to FieldCache except that the hard work is > done at indexing time instead of searching time. This is good > trade-off because it allows for faster loading of indexes and for > off-loading data to disk. This is never a bad idea to use doc values > for sorting. > >> Also, I vaguely remember that there is some performance loss for sorting >> based on string in lucene 4.0 >> Then, will the decision change for String field or based on type of field ? > > I don't see why String sorting would be slower. However, it is true > that String sorting requires a lot of memory. If your field is a > number, you should definitely use a numeric field cache. > >> 2> Also, In my understanding, if we need to use parser based queries for >> docvalues, we need to have a storedfield for a doc with same name & value of >> the doc's docvalue. >> Even term queries won't work. Am i right here? > > QueryParser is completely unaware of your schema. If you want > QueryParser to use doc-values-based queries, you can override > QueryParser.newRangeQuery and/or QueryParser.newFieldQuery to return a > new ConstantScoreQuery that wraps a FieldCacheRangeFilter. > > -- > Adrien > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org