[GitHub] [druid] paul-rogers commented on a diff in pull request #13794: Improve the wording around the `InvalidNullByteException`

via GitHub Mon, 13 Feb 2023 13:41:34 -0800


paul-rogers commented on code in PR #13794:
URL: https://github.com/apache/druid/pull/13794#discussion_r1105044774



##########
processing/src/main/java/org/apache/druid/frame/write/FrameWriterUtils.java:
##########
@@ -229,7 +229,11 @@ public static void copyByteBufferToMemory(
       final byte b = src.get(p);
 
       if (!allowNullBytes && b == 0) {
-        throw new InvalidNullByteException();
+        throw new InvalidNullByteException(
+            "Unable to add the frame because it contains null bytes. This 
usually happens when the added string columns "

Review Comment:
   Thanks for the improved message. The user, however, knows nothing about 
frames. Can we word this from the user's perspective?
   
   `Druid does not support null (0x00) bytes in strings. File %s, row $d, 
column %s contains null bytes: [%s].`
   
   The string would be encoded so that control characters appear as `\U0000` so 
the user can see the position of the null bytes.
   
   Maybe we don't know the row number (in a form useful to the user.) If not, 
just list the column.
   
   This is a case of unparsable data. Should we have caught it at the time we 
_read_ the data rather than when _writing_ to a frame? Should we invoke our 
bad-row logic to skip this row? That logic should log the bad row for later 
re-ingestion, but I don't think we've added that ability.
   
   If we catch the problem on read, then the check here is more of an 
assertion. Though, perhaps the data was created by an expression, so it is 
still worth validating.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] paul-rogers commented on a diff in pull request #13794: Improve the wording around the `InvalidNullByteException`

Reply via email to