icefury71 commented on a change in pull request #4585: Presence vector URL: https://github.com/apache/incubator-pinot/pull/4585#discussion_r321862314
########## File path: pinot-core/src/main/java/org/apache/pinot/core/data/recordtransformer/NullValueTransformer.java ########## @@ -33,17 +36,35 @@ public NullValueTransformer(Schema schema) { @Override public GenericRow transform(GenericRow record) { + Set<String> nullColumnNamesSet = null; + + // Clear out the 'null_fields' value in case the Generic row is reused + record.putField(NULL_FIELDS, null); + for (FieldSpec fieldSpec : _fieldSpecs) { String fieldName = fieldSpec.getName(); // Do not allow default value for time column if (record.getValue(fieldName) == null && fieldSpec.getFieldType() != FieldSpec.FieldType.TIME) { + if (nullColumnNamesSet == null) { + nullColumnNamesSet = new HashSet<>(); + } + + // Only handle null columns for non-virtual columns + if (!fieldSpec.isVirtualColumn()) { + nullColumnNamesSet.add(fieldName); + } + if (fieldSpec.isSingleValueField()) { record.putField(fieldName, fieldSpec.getDefaultNullValue()); } else { record.putField(fieldName, new Object[]{fieldSpec.getDefaultNullValue()}); } } } + + if (nullColumnNamesSet != null) { + record.putField(NULL_FIELDS, String.join(",", nullColumnNamesSet)); Review comment: I think the concern is the size of a Java object which is potentially bigger in this case. One optimization here is that if we have a fixed order of column names (in the FieldSpec), we can simply use a BitSet to keep track of null columns (which will typically be around 64 bytes ish). This can save a lot of effort in other parts of the code base as well. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org