I don’t know if it’s worth it in terms of the trade-offs, but there’s
something to be said about having *both* indexed=true & docValues=true on
the _version_ field in particular.  docValues is not an “index”; any
operation other than looking up the value for a specific document is
O(docs) with docValues.  VersionInfo.getMaxVersionFromIndex() has a slow
O(docs) algorithm when it has to use docValues, versus the field being
indexed=true which uses a O(log(versionCount)) where versionCount <= docs.
It’s actually sometimes constant-time if the index postings format supports
ordinals (the default BlockTree one does not).  Maybe we should use an
ord-supported postings format.  What I don’t know is how frequent some of
these operations are on a version field, thus could better judge the
trade-offs.

~ David

On Mon, Jun 22, 2015 at 1:01 PM Chris Hostetter <hossman_luc...@fucit.org>
wrote:

>
> This thread kind of got off into a tangent about solr specifics -- if you
> skip down it's really a question about underlying performance concerns of
> using docvalues vs using stored fields.
>
> : 1.      _version_ never needs to be searchable, thus, indexed=false
> makes sense.
>
> Unless i'm wrong, the version field is involved in "search" contexts
> because of optimistic concurrency - in order for an "updated doc=1 if
> version=42" then under the covers a search is done against hte version
> field --- but since this is a fairly constrained filter, indexed=false
> might still be fine as long as docValues=true because the search can be
> done via a DocValues based filter.
>
> : 4.      Given the above, is using docValues=true for _version_ a good
> idea?
>
> : My take is a simple “no”.  Since docValues is, in essence, column
> : oriented storage (and can be seen, I think, as an alternate index
> : format), what benefit is to be gained for the _version_ field.  The
>
> To be clear -- Solr already has code thta depends on having "Doc Values"
> on the version field to deal with max version value in segments (see
> VersionInfo.getVersionFromIndex and VersionInfo.getMaxVersionFromIndex) --
> but as with any field, that doens't mean you must have 'docValues="true"'
> in your schema, instead the UninvertedReader can be used as long as the
> field is indexed.
>
> But that's really not what Ishan is asking about.
>
> We know it's possible to use docValues=true && indexed=false on the
> version field -- SOLR-6337 is open to decide if that makes sense in the
> sample configs.  Ishan's question is really about stored=false.
>
> The key bit of context of Ishan's question is updateable docValues
> (SOLR-5944) and if/how it might be usable in Solr for the version field --
> but one key aspect of doing that would be in ensuring that we can *return*
> the correct version value to user (for optimistic concurrency).  Currently
> that's done with stored fields, but that wouldn't be feasible if we go
> down hte route of updateable docValues, which means we would have to
> "return" the version field from the docValues.
>
> that's where ishan's question about docvalues and performance and disk
> seeks comes from...
>
> What are the downsides in saying "instead of using docvalues and stored
> fields for this this single valued int per doc, we're only going to use
> docvalues & when doing pagination we will return the current value of the
> field to the user from the docvalues" what kind of performance impacts
> come up in that case when you have 100 docs per page(ination)
>
>
> -Hoss
> http://www.lucidworks.com/
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to