[ https://issues.apache.org/jira/browse/LUCENE-8178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Nikolay Khitrin updated LUCENE-8178: ------------------------------------ Attachment: LUCENE-8178-for-solr.patch > Bulk operations for LongValues and Sorted[Set]DocValues > ------------------------------------------------------- > > Key: LUCENE-8178 > URL: https://issues.apache.org/jira/browse/LUCENE-8178 > Project: Lucene - Core > Issue Type: Improvement > Affects Versions: 7.2.1 > Reporter: Nikolay Khitrin > Priority: Major > Attachments: LUCENE-8178-for-solr.patch, LUCENE-8178.patch > > > One-by-one DocValues iteration by {{advanceExact}} and > {{nextOrd}}/{{ordValue}} is really slow for bulk operations like facetting. > Reading and unpacking integers in blocks is substantially faster but > DocValues for now can be queried only for single document. > To apply document-based bulk processing {{DocIdSetIterator}} matches have to > be splitted to sequential docID runs and remapped to underlying > {{LongValues}} positions. > After this transformation relatively large linear scans can be performed > over packed integers. > > To do this two new interfaces > 1. {{LongValuesCollector}} ({{collectValue(long index, long value)}}). > 2. {{OrdStatsCollector}} ({{collectOrd(long ord)}}, {{collectMissing(int > count)}}). > and three new functions are introduced > 1. {{LongValues.forRange(long begin, long end, LongValuesCollector > collector)}} > 2. {{SortedDocValues.forEach(DocIdSetIterator disi, OrdStatsConsumer > collector)}} > 3. {{SortedSetDocValues.forEach(DocIdSetIterator disi, OrdStatsConsumer > collector)}} > with reference implementations. > Optimized versions of these functions are provided for: > 1. {{DirectReader}} for non-32/64 bits per value cases (using > {{PackedInts.Decoder}}). > 2. {{Lucene70DocValuesProducer}} {{getSorted}} and {{getSortedSet}} (both > sparse and dense). > > Measured Solr facetting performance boost is up to 2 - 2.5x on real index. > Patch for Solr {{DocValuesFacets}} is also provided as separate file. > > Implementation notes: > * {{OrdStatsCollector}} does not accept document id because it will ruin > performance for {{SortedSetDocValues}} due to excessive position lookups. > * This patch is fully compatible with Lucene 7.0 DocValues format. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org