[ https://issues.apache.org/jira/browse/LUCENE-7407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15595043#comment-15595043 ]
David Smiley commented on LUCENE-7407: -------------------------------------- I wouldn't _dare_ suggest to another committer how they should spend their time; it's entirely their prerogative. That's crossing a line; please stop! I think we should value all technical input, even if it's bad news (e.g. something got slower). Building/running a benchmark is being helpful. I understand if you don't like the benchmark in particular (I'm not going to argue it's a particularly good or bad one) but it's being helpful and it takes time to do these things. I'd be depressed right now if I were in Yonik's shoes; but hey that's me and we need emotions of steel around here to survive. > Explore switching doc values to an iterator API > ----------------------------------------------- > > Key: LUCENE-7407 > URL: https://issues.apache.org/jira/browse/LUCENE-7407 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Michael McCandless > Assignee: Michael McCandless > Labels: docValues > Fix For: master (7.0) > > Attachments: LUCENE-7407.patch > > > I think it could be compelling if we restricted doc values to use an > iterator API at read time, instead of the more general random access > API we have today: > * It would make doc values disk usage more of a "you pay for what > what you actually use", like postings, which is a compelling > reduction for sparse usage. > * I think codecs could compress better and maybe speed up decoding > of doc values, even in the non-sparse case, since the read-time > API is more restrictive "forward only" instead of random access. > * We could remove {{getDocsWithField}} entirely, since that's > implicit in the iteration, and the awkward "return 0 if the > document didn't have this field" would go away. > * We can remove the annoying thread locals we must make today in > {{CodecReader}}, and close the trappy "I accidentally shared a > single XXXDocValues instance across threads", since an iterator is > inherently "use once". > * We could maybe leverage the numerous optimizations we've done for > postings over time, since the two problems ("iterate over doc ids > and store something interesting for each") are very similar. > This idea has come up many in the past, e.g. LUCENE-7253 is a recent > example, and very early iterations of doc values started with exactly > this ;) > However, it's a truly enormous change, likely 7.0 only. Or maybe we > could have the new iterator APIs also ported to 6.x side by side with > the deprecate existing random-access APIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org