icefury71 commented on a change in pull request #4585: Presence vector
URL: https://github.com/apache/incubator-pinot/pull/4585#discussion_r321862314
 
 

 ##########
 File path: 
pinot-core/src/main/java/org/apache/pinot/core/data/recordtransformer/NullValueTransformer.java
 ##########
 @@ -33,17 +36,35 @@ public NullValueTransformer(Schema schema) {
 
   @Override
   public GenericRow transform(GenericRow record) {
+    Set<String> nullColumnNamesSet = null;
+
+    // Clear out the 'null_fields' value in case the Generic row is reused
+    record.putField(NULL_FIELDS, null);
+
     for (FieldSpec fieldSpec : _fieldSpecs) {
       String fieldName = fieldSpec.getName();
       // Do not allow default value for time column
       if (record.getValue(fieldName) == null && fieldSpec.getFieldType() != 
FieldSpec.FieldType.TIME) {
+        if (nullColumnNamesSet == null) {
+          nullColumnNamesSet = new HashSet<>();
+        }
+
+        // Only handle null columns for non-virtual columns
+        if (!fieldSpec.isVirtualColumn()) {
+          nullColumnNamesSet.add(fieldName);
+        }
+
         if (fieldSpec.isSingleValueField()) {
           record.putField(fieldName, fieldSpec.getDefaultNullValue());
         } else {
           record.putField(fieldName, new 
Object[]{fieldSpec.getDefaultNullValue()});
         }
       }
     }
+
+    if (nullColumnNamesSet != null) {
+      record.putField(NULL_FIELDS, String.join(",", nullColumnNamesSet));
 
 Review comment:
   I think the concern is the size of a Java object which is potentially bigger 
in this case. 
   
   One optimization here is that if we have a fixed order of column names (in 
the FieldSpec), we can simply use a BitSet to keep track of null columns (which 
will typically be around 64 bytes ish). This can save a lot of effort in other 
parts of the code base as well.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to