jinyius commented on code in PR #995:
URL: https://github.com/apache/parquet-mr/pull/995#discussion_r990914134


##########
parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoWriteSupport.java:
##########
@@ -559,7 +564,14 @@ final void writeRawValue(Object value) {
   class BinaryWriter extends FieldWriter {
     @Override
     final void writeRawValue(Object value) {
-      ByteString byteString = (ByteString) value;
+      // Non-ByteString values can happen when recursions gets truncated.
+      ByteString byteString = value instanceof ByteString
+          ? (ByteString) value
+          // TODO: figure out a way to use MessageOrBuilder
+          : value instanceof Message
+          ? ((Message) value).toByteString()
+          // Worst-case, just dump as plain java string.
+          : ByteString.copyFromUtf8(value.toString());

Review Comment:
   this is intended.  for a real-time, production pipeline i'm working on, 
losing data as it passes through or killing the job b/c of an uncaught 
exception is problematic as it could lead to data loss and down time.  this 
way, there's some way to know what the problematic data was and fix it properly 
asap.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to