I don’t know if it’s worth it in terms of the trade-offs, but there’s something to be said about having *both* indexed=true & docValues=true on the _version_ field in particular. docValues is not an “index”; any operation other than looking up the value for a specific document is O(docs) with docValues. VersionInfo.getMaxVersionFromIndex() has a slow O(docs) algorithm when it has to use docValues, versus the field being indexed=true which uses a O(log(versionCount)) where versionCount <= docs. It’s actually sometimes constant-time if the index postings format supports ordinals (the default BlockTree one does not). Maybe we should use an ord-supported postings format. What I don’t know is how frequent some of these operations are on a version field, thus could better judge the trade-offs.
~ David On Mon, Jun 22, 2015 at 1:01 PM Chris Hostetter <hossman_luc...@fucit.org> wrote: > > This thread kind of got off into a tangent about solr specifics -- if you > skip down it's really a question about underlying performance concerns of > using docvalues vs using stored fields. > > : 1. _version_ never needs to be searchable, thus, indexed=false > makes sense. > > Unless i'm wrong, the version field is involved in "search" contexts > because of optimistic concurrency - in order for an "updated doc=1 if > version=42" then under the covers a search is done against hte version > field --- but since this is a fairly constrained filter, indexed=false > might still be fine as long as docValues=true because the search can be > done via a DocValues based filter. > > : 4. Given the above, is using docValues=true for _version_ a good > idea? > > : My take is a simple “no”. Since docValues is, in essence, column > : oriented storage (and can be seen, I think, as an alternate index > : format), what benefit is to be gained for the _version_ field. The > > To be clear -- Solr already has code thta depends on having "Doc Values" > on the version field to deal with max version value in segments (see > VersionInfo.getVersionFromIndex and VersionInfo.getMaxVersionFromIndex) -- > but as with any field, that doens't mean you must have 'docValues="true"' > in your schema, instead the UninvertedReader can be used as long as the > field is indexed. > > But that's really not what Ishan is asking about. > > We know it's possible to use docValues=true && indexed=false on the > version field -- SOLR-6337 is open to decide if that makes sense in the > sample configs. Ishan's question is really about stored=false. > > The key bit of context of Ishan's question is updateable docValues > (SOLR-5944) and if/how it might be usable in Solr for the version field -- > but one key aspect of doing that would be in ensuring that we can *return* > the correct version value to user (for optimistic concurrency). Currently > that's done with stored fields, but that wouldn't be feasible if we go > down hte route of updateable docValues, which means we would have to > "return" the version field from the docValues. > > that's where ishan's question about docvalues and performance and disk > seeks comes from... > > What are the downsides in saying "instead of using docvalues and stored > fields for this this single valued int per doc, we're only going to use > docvalues & when doing pagination we will return the current value of the > field to the user from the docvalues" what kind of performance impacts > come up in that case when you have 100 docs per page(ination) > > > -Hoss > http://www.lucidworks.com/ > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org