Copilot commented on code in PR #16670:
URL: https://github.com/apache/pinot/pull/16670#discussion_r2293651578


##########
pinot-common/src/main/java/org/apache/pinot/common/datatable/DataTableImplV4.java:
##########
@@ -381,7 +381,9 @@ protected byte[] serializeStringDictionary()
     int numEntriesAdded = 0;
     for (String entry : _stringDictionary) {
       
Tracing.ThreadAccountantOps.sampleAndCheckInterruptionPeriodically(numEntriesAdded);
-      byte[] valueBytes = entry.getBytes(UTF_8);
+      // It is strange to find nulls in the dictionary, but it could happen 
when
+      // string arrays contain nulls
+      byte[] valueBytes = entry == null ? new byte[0] : entry.getBytes(UTF_8);
       dataOutputStream.writeInt(valueBytes.length);
       dataOutputStream.write(valueBytes);

Review Comment:
   Converting null strings to empty byte arrays during serialization will cause 
data loss. When deserialized, empty strings will be read back instead of null 
values. This breaks the round-trip integrity of null values in string arrays.
   ```suggestion
         if (entry == null) {
           dataOutputStream.writeBoolean(false); // null marker
         } else {
           dataOutputStream.writeBoolean(true); // non-null marker
           byte[] valueBytes = entry.getBytes(UTF_8);
           dataOutputStream.writeInt(valueBytes.length);
           dataOutputStream.write(valueBytes);
         }
   ```



##########
pinot-common/src/main/java/org/apache/pinot/common/datatable/DataTableImplV4.java:
##########
@@ -381,7 +381,9 @@ protected byte[] serializeStringDictionary()
     int numEntriesAdded = 0;
     for (String entry : _stringDictionary) {
       
Tracing.ThreadAccountantOps.sampleAndCheckInterruptionPeriodically(numEntriesAdded);
-      byte[] valueBytes = entry.getBytes(UTF_8);
+      // It is strange to find nulls in the dictionary, but it could happen 
when
+      // string arrays contain nulls

Review Comment:
   [nitpick] The comment suggests uncertainty about whether nulls in string 
arrays are a valid use case. Consider clarifying whether this is expected 
behavior or documenting the design decision to convert nulls to empty strings.
   ```suggestion
         // Nulls in the dictionary can occur when string arrays contain nulls.
         // By design, nulls are serialized as empty byte arrays (i.e., 
converted to empty strings).
         // This ensures consistent handling during serialization and 
deserialization.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to