ChrisHegarty commented on code in PR #15990:
URL: https://github.com/apache/lucene/pull/15990#discussion_r3217338514


##########
lucene/core/src/java/org/apache/lucene/index/DocumentsWriterPerThread.java:
##########
@@ -215,6 +217,22 @@ private void reserveOneDoc() {
     }
   }
 
+  /**
+   * Atomically reserves capacity for {@code n} docs. On failure, fully rolls 
back so no
+   * reservations leak.
+   */
+  private void reserveDocs(int n) {
+    assert n >= 0;
+    if (n == 0) {
+      return;
+    }
+    if (pendingNumDocs.addAndGet(n) > IndexWriter.getActualMaxDocs()) {
+      pendingNumDocs.addAndGet(-n);
+      throw new IllegalArgumentException(
+          "number of documents in the index cannot exceed " + 
IndexWriter.getActualMaxDocs());
+    }
+  }

Review Comment:
   Avoids the potentially exceeding the max, and the need to rollback.
   
   ```suggestion
   /**
    * Atomically reserves capacity for {@code n} docs.
    *
    * @throws IllegalArgumentException if reserving {@code n} docs would exceed 
the maximum
    *     number of documents allowed in the index
    */
   private void reserveDocs(int n) {
     assert n >= 0;
     if (n == 0) {
       return;
     }
     final int maxDocs = IndexWriter.getActualMaxDocs();
     while (true) {
       long current = pendingNumDocs.get();
       long next = current + n;
       if (next > maxDocs) {
         throw new IllegalArgumentException(
             "number of documents in the index cannot exceed " + maxDocs);
       }
       if (pendingNumDocs.compareAndSet(current, next)) {
         return;
       }
     }
   }
   ```



##########
lucene/core/src/java/module-info.java:
##########
@@ -76,6 +76,10 @@
   exports org.apache.lucene.codecs.hnsw;
   exports org.apache.lucene.internal.vectorization to
       org.apache.lucene.benchmark.jmh;
+  exports org.apache.lucene.document.column;
+
+  opens org.apache.lucene.document.column to
+      org.apache.lucene.test_framework;

Review Comment:
   What uses the this `opens`?



##########
lucene/core/src/java/org/apache/lucene/index/SortedNumericDocValuesWriter.java:
##########
@@ -68,11 +69,44 @@ public void addValue(int docID, long value) {
     updateBytesUsed();
   }
 
+  public void addDenseValues(int firstDocID, LongValuesCursor cursor) {

Review Comment:
   ```suggestion
    void addDenseValues(int firstDocID, LongValuesCursor cursor) {
   ```



##########
lucene/core/src/java/org/apache/lucene/index/SortedNumericDocValuesWriter.java:
##########
@@ -68,11 +69,44 @@ public void addValue(int docID, long value) {
     updateBytesUsed();
   }
 
+  public void addDenseValues(int firstDocID, LongValuesCursor cursor) {
+    int numValues = cursor.size();
+    if (numValues == 0) {
+      return;
+    }
+    assert firstDocID > currentDoc;
+    finishCurrentDoc();
+
+    // Write values directly to pending — each value is one doc, single-valued.
+    // No currentValues[] buffering, no sorting needed.
+    pending.add(cursor);
+
+    // If pendingCounts is active (some earlier doc was multi-valued),
+    // record count=1 for each dense doc.
+    if (pendingCounts != null) {
+      for (int i = 0; i < numValues; i++) {
+        pendingCounts.add(1);
+      }
+    }
+
+    // Bulk-add consecutive doc-ids
+    docsWithField.addRange(firstDocID, firstDocID + numValues);
+
+    // Set currentDoc to last written doc so ordering is maintained.
+    // currentUpto stays 0 — nothing buffered.
+    currentDoc = firstDocID + numValues - 1;

Review Comment:
   ```suggestion
       final int endDocID = firstDocID + numValues;
       // Bulk-add consecutive doc-ids
       docsWithField.addRange(firstDocID, endDocID);
   
       // Set currentDoc to last written doc so ordering is maintained.
       // currentUpto stays 0 — nothing buffered.
       currentDoc = endDocID - 1;
   ```



##########
lucene/core/src/java/org/apache/lucene/index/NumericDocValuesWriter.java:
##########
@@ -64,6 +65,21 @@ public void addValue(int docID, long value) {
     lastDocID = docID;
   }
 
+  public void addDenseValues(int firstDocID, LongValuesCursor cursor) {

Review Comment:
   ```suggestion
    void addDenseValues(int firstDocID, LongValuesCursor cursor) {
   ```



##########
lucene/core/src/java/org/apache/lucene/document/column/LongTupleCursor.java:
##########
@@ -0,0 +1,42 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.document.column;
+
+import org.apache.lucene.search.DocIdSetIterator;
+
+/**
+ * A tuple cursor over a {@link LongColumn}. Yields {@code (docID, longValue)} 
pairs. Batch-local
+ * doc-ids are returned in non-decreasing order; the same doc-id may repeat 
for multi-valued fields
+ * (e.g. {@link org.apache.lucene.index.DocValuesType#SORTED_NUMERIC 
SORTED_NUMERIC}).
+ *
+ * @lucene.experimental
+ */
+public abstract class LongTupleCursor {
+
+  /**
+   * Advances to the next doc-id that has a value and returns it, or {@link
+   * DocIdSetIterator#NO_MORE_DOCS} if exhausted. Doc-ids are batch-local (0 
to {@code numDocs -
+   * 1}).

Review Comment:
   Just being a little pedantic around multi-values.
   ```suggestion
      * Advances to the next tuple and returns its doc-id, or {@link
      * DocIdSetIterator#NO_MORE_DOCS} if exhausted.
      *
      * <p>Returned doc-ids are batch-local (0 to {@code numDocs - 1}) and are 
emitted in
      * non-decreasing order. The same doc-id may be returned multiple times 
when a document 
      * has multiple values.
   ```



##########
lucene/core/src/java/org/apache/lucene/index/PointValuesWriter.java:
##########
@@ -80,6 +92,84 @@ public void addPackedValue(int docID, BytesRef value) throws 
IOException {
     numPoints++;
   }
 
+  public void addDense1DIntValues(int firstDocID, LongValuesCursor cursor) 
throws IOException {

Review Comment:
   ```suggestion
    void addDense1DIntValues(int firstDocID, LongValuesCursor cursor) throws 
IOException {
   ```



##########
lucene/core/src/java/org/apache/lucene/index/PointValuesWriter.java:
##########
@@ -80,6 +92,84 @@ public void addPackedValue(int docID, BytesRef value) throws 
IOException {
     numPoints++;
   }
 
+  public void addDense1DIntValues(int firstDocID, LongValuesCursor cursor) 
throws IOException {
+    validate1DPacked(Integer.BYTES);
+    final int size = cursor.size();
+    if (size == 0) {
+      return;
+    }
+    final long ramBefore = reserveDense1D(firstDocID, size);
+    final byte[] buffer = pointsDenseBuffer();
+    int remaining = size;
+    while (remaining > 0) {
+      int chunk = Math.min(POINTS_BUFFER_INT_VALUES, remaining);
+      cursor.fillIntPoints(buffer, 0, chunk);
+      bytesOut.writeBytes(buffer, 0, chunk * Integer.BYTES);
+      remaining -= chunk;
+    }
+    commitDense1D(firstDocID, size, ramBefore);
+  }
+
+  public void addDense1DLongValues(int firstDocID, LongValuesCursor cursor) 
throws IOException {

Review Comment:
   ```suggestion
    void addDense1DLongValues(int firstDocID, LongValuesCursor cursor) throws 
IOException {
   ```



##########
lucene/core/src/java/org/apache/lucene/document/column/ObjectTupleCursor.java:
##########
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.document.column;
+
+import org.apache.lucene.search.DocIdSetIterator;
+
+/**
+ * A tuple cursor over a {@link Column} whose values are objects. Yields 
{@code (docID, value)}
+ * pairs. Batch-local doc-ids are returned in non-decreasing order; the same 
doc-id may repeat for
+ * multi-valued fields (e.g. {@link 
org.apache.lucene.index.DocValuesType#SORTED_SET SORTED_SET}).
+ * Single-valued columns (e.g. {@link VectorColumn}) emit each doc-id at most 
once.
+ *
+ * @param <T> the value type
+ * @lucene.experimental
+ */
+public abstract class ObjectTupleCursor<T> {
+
+  /**
+   * Advances to the next doc-id that has a value and returns it, or {@link
+   * DocIdSetIterator#NO_MORE_DOCS} if exhausted. Doc-ids are batch-local (0 
to {@code numDocs -
+   * 1}).

Review Comment:
   same comment as for LongTupleCursor::nextCursor



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to