[ 
https://issues.apache.org/jira/browse/LUCENE-5248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13790834#comment-13790834
 ] 

Robert Muir commented on LUCENE-5248:
-------------------------------------

Hi Shai:

should UpdatesIterator implement DISI? It seems like it might be a good fit.

{code}
+    private final FixedBitSet docsWithField;
+    private PagedMutable docs;
+    private PagedGrowableWriter values;
{code}

When we have multiple related structures like this, maybe we can add a comment 
as to what each is?
Something like:
{code}
// bit per docid: set if the value is "real"
// TODO: is bitset(maxdoc) really needed since usually its sparse? why not an 
openbitset parallel with "docs"?
private final FixedBitSet docsWithField;
// holds a list of documents.
// TODO: do these really need to be absolute-encoded?
private PagedMutable docs;
// holds a list of values, parallel with docs
private PagedGrowableWriter values;
{code}

{code}
+      docsWithField = new FixedBitSet(maxDoc);
+      docsWithField.clear(0, maxDoc)
{code}

The clear should be unnecessary!

{code}
+    public void add(int doc, Long value) {
+      assert value != null;
+      if (size == Integer.MAX_VALUE) {
+        throw new IllegalStateException("cannot support more than 
Integer.MAX_VALUE doc/value entries");
+      }
{code}

Is this really a limitation?

{code}
+        @Override
+        protected int compare(int i, int j) {
+          return (int) (docs.get(i) - docs.get(j));
+        }
{code}

Can we just use Long.compare? this subtraction may be safe... but it would 
smell better.

> Improve the data structure used in ReaderAndLiveDocs to hold the updates
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-5248
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5248
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/index
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>         Attachments: LUCENE-5248.patch, LUCENE-5248.patch, LUCENE-5248.patch, 
> LUCENE-5248.patch
>
>
> Currently ReaderAndLiveDocs holds the updates in two structures:
> +Map<String,Map<Integer,Long>>+
> Holds a mapping from each field, to all docs that were updated and their 
> values. This structure is updated when applyDeletes is called, and needs to 
> satisfy several requirements:
> # Un-ordered writes: if a field "f" is updated by two terms, termA and termB, 
> in that order, and termA affects doc=100 and termB doc=2, then the updates 
> are applied in that order, meaning we cannot rely on updates coming in order.
> # Same document may be updated multiple times, either by same term (e.g. 
> several calls to IW.updateNDV) or by different terms. Last update wins.
> # Sequential read: when writing the updates to the Directory 
> (fieldsConsumer), we iterate on the docs in-order and for each one check if 
> it's updated and if not, pull its value from the current DV.
> # A single update may affect several million documents, therefore need to be 
> efficient w.r.t. memory consumption.
> +Map<Integer,Map<String,Long>>+
> Holds a mapping from a document, to all the fields that it was updated in and 
> the updated value for each field. This is used by IW.commitMergedDeletes to 
> apply the updates that came in while the segment was merging. The 
> requirements this structure needs to satisfy are:
> # Access in doc order: this is how commitMergedDeletes works.
> # One-pass: we visit a document once (currently) and so if we can, it's 
> better if we know all the fields in which it was updated. The updates are 
> applied to the merged ReaderAndLiveDocs (where they are stored in the first 
> structure mentioned above).
> Comments with proposals will follow next.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to