Are there some graphic diagrams that illustrate this distinction in how
stored fields and doc values are organized, including both the heap and
non-heap aspects like file caching. Sometimes a picture is worth 1K words.
Even if somebody could just draw it on a piece of paper and scan it.
-- Jack Krupansky
-----Original Message-----
From: Adrien Grand
Sent: Friday, April 4, 2014 4:50 PM
To: java-user@lucene.apache.org
Subject: Re: Stored fields and OS file caching
Hi Vitaly,
Doc values are indeed well-suited for grouping and sorting. However
stored fields remain better at returning field values to users since
they guarantee a worst-case of one disk seek per document.
The filesystem cache typically caches data by blocks of 4KB. This
plays more nicely with doc values: given that they are stored in a
column-stride fashion, you are load only those field values into the
filesystem cache. On the other hand with stored fields, data is stored
sequentially in a very large file, so whenever you read a single field
value, the filesystem cache would load a 4KB block of data into the
filesystem cache that likely contains other fields' values that you
are not interested in.
On Sat, Apr 5, 2014 at 12:23 AM, Vitaly Funstein <vfunst...@gmail.com>
wrote:
I use stored fields to load values for the following use cases:
- to return per-document values as is, requested by the user - similar to
listing DB columns you are interested in, in a "select ..." clause.
- to perform aggregate function calculations while forming the result set
(if requested).
- for group-by type queries (would like to switch to the native grouping
API, but don't think it supports grouping on multiple fields, or aggregate
functions).
- and finally, as I mentioned - to sort search results, also when
requested.
Evidently, even for simple queries that don't require any of the
post-processing above but ask for a set of values from each document,
there's still non-trivial amount of disk activity... hence, I started
second-guessing the implementation.
On Fri, Apr 4, 2014 at 3:00 PM, Uwe Schindler <u...@thetaphi.de> wrote:
Hi,
What are you doing with the stored fields? They are not deprecated and
also not really slow, unless you scan over millions of documents in
random
access order. To display serach results, DocValues are of no use.
Uwe
-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
> -----Original Message-----
> From: Vitaly Funstein [mailto:vfunst...@gmail.com]
> Sent: Friday, April 04, 2014 9:44 PM
> To: java-user@lucene.apache.org
> Subject: Stored fields and OS file caching
>
> I have heard here that stored fields don't work well with OS file
caching.
> Could someone elaborate on why that is? I am using Lucene 4.6 and we do
> use stored fields but not doc values; it appears most of the benefit
from the
> latter comes as improvement in sorting performance, and I don't
> actually
use
> Lucene for sorting at all; rather, it's done on a post-processing
> basis,
based on
> stored field values (in a nutshell, the reason for this is Lucene's
inability to tell
> apart terms that are empty strings vs. a missing value, resulting in
unstable
> sort order on such fields).
>
> I am not sure if switching to using doc values fields from stored
> fields
entirely
> would help leverage OS file cache better... what worries me is that
> when
> processing queries requesting multiple values from the document, doc
value
> fields could cause multiple disk seeks to fetch values for each field,
> as
> opposed to just one with stored fields.
>
> Am I way off in my understanding of how this works? Any guidelines, as
> general as they may be, are appreciated.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
--
Adrien
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org