gortiz commented on code in PR #10184:
URL: https://github.com/apache/pinot/pull/10184#discussion_r1153377256


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/nullvalue/NullValueVectorCreator.java:
##########
@@ -19,23 +19,44 @@
 package org.apache.pinot.segment.local.segment.creator.impl.nullvalue;
 
 import com.google.common.annotations.VisibleForTesting;
-import java.io.Closeable;
 import java.io.DataOutputStream;
 import java.io.File;
 import java.io.FileOutputStream;
 import java.io.IOException;
+import javax.annotation.Nonnull;
+import javax.annotation.Nullable;
 import org.apache.pinot.segment.spi.V1Constants;
+import org.apache.pinot.segment.spi.index.IndexCreator;
 import org.roaringbitmap.buffer.ImmutableRoaringBitmap;
 import org.roaringbitmap.buffer.MutableRoaringBitmap;
 
 
 /**
  * Used to persist the null bitmap on disk. This is used by SegmentCreator 
while indexing rows.
+ *
+ * Although this class implements {@link IndexCreator}, it is not intended to 
be used as a normal IndexCreator.
+ * Specifically, neither {@link #add(Object, int)} or {@link #add(Object[], 
int[])} should be called on this object.
+ * In order to make sure these methods are not being called, they throw 
exceptions in this class.
+ *
+ * This requirement is a corollary from the fact that the {@link IndexCreator} 
contract assumes the value will never be
+ * null, which is true for all index creators types unless this one.
  */
-public class NullValueVectorCreator implements Closeable {
+public class NullValueVectorCreator implements IndexCreator {
   private final MutableRoaringBitmap _nullBitmap = new MutableRoaringBitmap();
   private final File _nullValueVectorFile;
 
+  @Override
+  public void add(@Nonnull Object value, int dictId)

Review Comment:
   We have been talking about this online, but I'm going to write here why I 
think we shouldn't do it:
   
   All _normal_ index creators assume that the `add` methods will be called in 
docId order. What was a implicit convention is formalized in the Javadoc of 
IndexCreator in this PR. When I say _normal_ I mean all index creators but:
   - FST, which just ignores the `add` method.
   - null and dictionary, which actually break the contract.
   
   I don't like to have creators that break a contract, but I think in this 
case is worth the cost. By doing that we simplify the interface a lot and we 
don't expect to have more special cases like null or dictionary. As said in 
another comment, null and dictionary are not, semantically, indexes but an 
encoding or a datatype flavor (dictionary is actually specified as encoding and 
all SQL databases consider nullable a flavor of the datatype). We decided to 
implement them as indexes because it was the tool we had when we introduce 
them, but they are actually breaking the semantics.
   
   Can we change the contract of IndexCreator in order to receive the `docId`? 
Yes, we can, but IMHO it may be confused. I mean, the contract is that add will 
be called in `docId` order. If we add the the `docId` parameter, people that 
implement `IndexCreator` may otherwise.
   
   Apart from that, even if we add this, we will need to treat 
`NullValueVectorCreator` in a special way in `SegmentColumnarIndexCreator`, as 
we need to ask whether the cell value is null or not. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to