[ 
https://issues.apache.org/jira/browse/LUCENE-7407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-7407:
---------------------------------------
    Attachment: LUCENE-7407.patch

I think this nearly ready!  I've fixed all nocommits, but {{ant precommit}} is 
a bit angry still... I'll fix before pushing.

I'm attaching an applyable patch vs. current master.

All doc values usage has been switched to iterators instead of random access, 
and {{getDocsWithField}} is gone.

I've done very little to improve the default codec to take advantage of this.  
I think there is a lot of fun improvements we can make here, in follow-on 
issues, so that e.g. LUCENE-7253 (merging of sparse doc values fields) is fixed.

To write doc values we now pass a {{DocValuesProducer}} (instead of N 
Iterables), and I created legacy deprecated bridge classes 
({{LegacyDocValuesIterables}}) to turns these back into Iterables for existing 
codecs.

I also created legacy bridge classes to turn random access DVs into the new 
iterators.


> Explore switching doc values to an iterator API
> -----------------------------------------------
>
>                 Key: LUCENE-7407
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7407
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>              Labels: docValues
>         Attachments: LUCENE-7407.patch
>
>
> I think it could be compelling if we restricted doc values to use an
> iterator API at read time, instead of the more general random access
> API we have today:
>   * It would make doc values disk usage more of a "you pay for what
>     what you actually use", like postings, which is a compelling
>     reduction for sparse usage.
>   * I think codecs could compress better and maybe speed up decoding
>     of doc values, even in the non-sparse case, since the read-time
>     API is more restrictive "forward only" instead of random access.
>   * We could remove {{getDocsWithField}} entirely, since that's
>     implicit in the iteration, and the awkward "return 0 if the
>     document didn't have this field" would go away.
>   * We can remove the annoying thread locals we must make today in
>     {{CodecReader}}, and close the trappy "I accidentally shared a
>     single XXXDocValues instance across threads", since an iterator is
>     inherently "use once".
>   * We could maybe leverage the numerous optimizations we've done for
>     postings over time, since the two problems ("iterate over doc ids
>     and store something interesting for each") are very similar.
> This idea has come up many in the past, e.g. LUCENE-7253 is a recent
> example, and very early iterations of doc values started with exactly
> this ;)
> However, it's a truly enormous change, likely 7.0 only.  Or maybe we
> could have the new iterator APIs also ported to 6.x side by side with
> the deprecate existing random-access APIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to