[ 
https://issues.apache.org/jira/browse/SOLR-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13487620#comment-13487620
 ] 

Adrien Grand commented on SOLR-3855:
------------------------------------

bq. We could combine these? e.g. a docValueType of "none" or something? This 
would parallel the lucene apis and maybe make things a bit simpler.

Good point.

Additionally I currently force doc values to be non-direct (ie. in-memory). Do 
you think it is fine or should we give people the choice? I wasn't sure when 
writing the patch because I think they would provide irregular performance 
depending on the good will of the I/O cache (I was thinking of people 
benchmarking with a read-only index, then going into production and performing 
a sort on a large result set while a background merge is running (eating all 
the I/O cache memory) and BOOM!). But maybe I'm too pessimistic. :-)

bq. it would be really great if fieldcache and docvalues had the same API

Yes it would make things so much easier... I also wish DocValues.Source and 
FunctionValues were the same class.

bq. Would be awesome if faceting etc could use docvalues: though I think there 
is likely some work for the multivalued case?

Right, DocValues faceting has its own challenges. :-) But that's clearly an 
issue where merging fieldcache, DocValues.Source and FunctionValues would make 
things easier : we would have only one code base that is independant from the 
source of "values" and SOLR-1581 would almost come free.

bq. I didn't look at this part, but is this really true? its numFields * rows 
right?

I was thinking of non-direct doc values for ID fields. Correct me if I'm wrong 
but when doing a distributed search:

 1. createMainQuery: Solr first asks every shard for the IDs of the best (start 
+ rows) docs
 2. createRetrieveDocs: Solr selects the {{rows}} IDs of documents to display 
and asks the shards  they are stored on for their stored fields

So step 1 requires {{(start + rows)}} seeks in the FDT file per shard (to know 
their IDs) and step 2 requires {{rows}} seeks overall. So the total is 
{{(numShards * (start + rows)) + rows}}. If we stored document IDs in memory I 
think this could help reduce this number to {{rows}} (only the second step), 
which would be great, especially for deep paging or large number of shards.

bq. But in general if docvalues are presented like stored fields for general 
purposes I think thats not a great illusion to give to the user in case they 
have a lot of fields?

Of course it makes no sense to store all fields in DocValues, I think they are 
best used for ID fields, sorting, scoring factors (function queries) and (soon 
:)) faceting. I wanted them to behave like stored fields so that users don't 
make their fields stored in addition to DocValues for convenience (this is a 
waste of space, and the bigger the FDT file is, the more likely the I/O cache 
can't serve disk seeks in this file).
                
> DocValues support
> -----------------
>
>                 Key: SOLR-3855
>                 URL: https://issues.apache.org/jira/browse/SOLR-3855
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Assignee: Adrien Grand
>            Priority: Minor
>             Fix For: 4.1, 5.0
>
>         Attachments: SOLR-3855.patch
>
>
> It would be nice if Solr supported DocValues:
>  - for ID fields (fewer disk seeks when running distributed search),
>  - for sorting/faceting/function queries (faster warmup time than fieldcache),
>  - better on-disk and in-memory efficiency (you can use packed impls).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to