[ https://issues.apache.org/jira/browse/SOLR-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13487620#comment-13487620 ]
Adrien Grand commented on SOLR-3855: ------------------------------------ bq. We could combine these? e.g. a docValueType of "none" or something? This would parallel the lucene apis and maybe make things a bit simpler. Good point. Additionally I currently force doc values to be non-direct (ie. in-memory). Do you think it is fine or should we give people the choice? I wasn't sure when writing the patch because I think they would provide irregular performance depending on the good will of the I/O cache (I was thinking of people benchmarking with a read-only index, then going into production and performing a sort on a large result set while a background merge is running (eating all the I/O cache memory) and BOOM!). But maybe I'm too pessimistic. :-) bq. it would be really great if fieldcache and docvalues had the same API Yes it would make things so much easier... I also wish DocValues.Source and FunctionValues were the same class. bq. Would be awesome if faceting etc could use docvalues: though I think there is likely some work for the multivalued case? Right, DocValues faceting has its own challenges. :-) But that's clearly an issue where merging fieldcache, DocValues.Source and FunctionValues would make things easier : we would have only one code base that is independant from the source of "values" and SOLR-1581 would almost come free. bq. I didn't look at this part, but is this really true? its numFields * rows right? I was thinking of non-direct doc values for ID fields. Correct me if I'm wrong but when doing a distributed search: 1. createMainQuery: Solr first asks every shard for the IDs of the best (start + rows) docs 2. createRetrieveDocs: Solr selects the {{rows}} IDs of documents to display and asks the shards they are stored on for their stored fields So step 1 requires {{(start + rows)}} seeks in the FDT file per shard (to know their IDs) and step 2 requires {{rows}} seeks overall. So the total is {{(numShards * (start + rows)) + rows}}. If we stored document IDs in memory I think this could help reduce this number to {{rows}} (only the second step), which would be great, especially for deep paging or large number of shards. bq. But in general if docvalues are presented like stored fields for general purposes I think thats not a great illusion to give to the user in case they have a lot of fields? Of course it makes no sense to store all fields in DocValues, I think they are best used for ID fields, sorting, scoring factors (function queries) and (soon :)) faceting. I wanted them to behave like stored fields so that users don't make their fields stored in addition to DocValues for convenience (this is a waste of space, and the bigger the FDT file is, the more likely the I/O cache can't serve disk seeks in this file). > DocValues support > ----------------- > > Key: SOLR-3855 > URL: https://issues.apache.org/jira/browse/SOLR-3855 > Project: Solr > Issue Type: Improvement > Reporter: Adrien Grand > Assignee: Adrien Grand > Priority: Minor > Fix For: 4.1, 5.0 > > Attachments: SOLR-3855.patch > > > It would be nice if Solr supported DocValues: > - for ID fields (fewer disk seeks when running distributed search), > - for sorting/faceting/function queries (faster warmup time than fieldcache), > - better on-disk and in-memory efficiency (you can use packed impls). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org