[
https://issues.apache.org/jira/browse/SOLR-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13882933#comment-13882933
]
Per Steffensen commented on SOLR-5670:
--------------------------------------
bq. Is there any benchmark data? If docValues provides better performance for
_version_ than indexed
I do not think it will in most cases.
* Indexed: When you want to get the _version_ for a particular doc-no (found by
id), you can make a lookup in FieldCache holding the reversed term-index - this
is in memory and constant time. If you have a very rapidly changing data-set
(so that FieldCache-entries will be invalidated often due to merging) you might
get better performance (response-time) with doc-values - but not in general, I
think.
* DocValues: You will read the _version_ from doc-values which in not
necessarily in memory
We are prepared to take a small performance hit, to avoid having all that data
in FieldCache. In general we do not allow putting anything in FieldCache,
because we have so many documents, that is always creates issues with too much
memory usage. The problem with FieldCache is that it is all or nothing - for a
good reasons! - we just cannot live with it.
We havnt made the change on _version_ (going from indexed to doc-value) in
production yet. We will do some performance testing on it first, and depending
on how much we decide to do, I can get back with some numbers.
bq. when it is used for its intended purpose, it might be worth changing the
example config
Do not think you should do that. Using FieldCache is probably the best
"default". But writing something somewhere about the option of using doc-values
instead of indexed, and when that is a good idea, would be nice.
bq. ... but people should know that if they do change the config on this field,
they will have to completely reindex.
Or just start using it from now on in new collections. We create a new
collection every month and keep a history of data by keeping the "latest" 24
collections. One of many reasons for doing this, is that it provides us the
option of changing indexing-strategy etc every month. For us re-indexing is
completely out of the question - we have billions and billions of records in
Solr and re-indexing them all in a fairly short service-window is not possible.
Therefore we built this new-collection-every-month thingy in order to have some
flexibility.
bq. This patch is functionally identical to the previous one, it just modifies
an error message.
Nicely spotted
bq. I didn't check to see what branch Per's patch was created on, but it did
apply cleanly to branch_4x.
It was branch_4x
> _version_ either indexed OR docvalue
> ------------------------------------
>
> Key: SOLR-5670
> URL: https://issues.apache.org/jira/browse/SOLR-5670
> Project: Solr
> Issue Type: Improvement
> Components: SolrCloud
> Affects Versions: 4.7
> Reporter: Per Steffensen
> Assignee: Per Steffensen
> Labels: solr, solrcloud, version
> Attachments: SOLR-5670.patch, SOLR-5670.patch
>
>
> As far as I can see there is no good reason to require that "_version_" field
> has to be indexed if it is docvalued. So I guess it will be ok with a rule
> saying "_version_ has to be either indexed or docvalue (allowed to be both)".
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]