[ 
https://issues.apache.org/jira/browse/LUCENE-4765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-4765:
--------------------------------

    Attachment: LUCENE-4765.patch

Updated patch showing differences between trunk and branch.

I actually think this is ready:
* its a docvalues field where you can add multiple instances to a document.
* these are dereferenced (like SORTED), except for each document you get a 
ordered list of ordinals instead of a single one.
* transparent pass-thru to FieldCache.getDocTermOrds: so this "completes" dv in 
that we have index-time equivalent to what FieldCache provides.
* if you ask for FieldCache.getDocTermOrds, instead of insanity for a 
single-valued field indexed by SORTED, you get a bridge API: so e.g. if we 
wanted we could start with a per-segment facet API for solr that handles both 
single/multi-valued and specialize only if it increases perf.
* all apis cutover, including join/ and grouping/, though while doing this I 
noticed an opportunity to separately make join/ more efficient (LUCENE-4771)
* refactored DocValues default merge to be simpler (also the existing SORTED 
case), additionally this benefits from the RAM improvements Adrien committed in 
LUCENE-4780.
* Lucene42 implementation uses an FST for the ord/term "dictionary", and the 
ordinal list per-doc is essential a BINARY entry (vint+dgap encoded, as this 
seems to be the most efficient from the tests Shai et al have been doing with 
lucene/facets).
* SimpleText, Disk, Asserting, and CheapBastard codecs.
* I added random tests that basically index and delete lots of things and 
verify the contents against stored fields, and DocTermOrds built in RAM from 
the indexed contents. 

Just wanted to get the patch up for review for a while. In the meantime I'll 
continue to make some commits: for example I want to add this type to 
IndexWriter's diskfull/exception/thread interrupt/etc tests and the usual 
rounding out of things.

                
> Multi-valued docvalues field
> ----------------------------
>
>                 Key: LUCENE-4765
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4765
>             Project: Lucene - Core
>          Issue Type: New Feature
>            Reporter: Robert Muir
>         Attachments: LUCENE-4765.patch, LUCENE-4765.patch
>
>
> The general idea is basically the docvalues parallel to 
> FieldCache.getDocTermOrds/UninvertedField
> Currently this stuff is used in e.g. grouping and join for multivalued 
> fields, and in solr for faceting.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to