schenksj commented on PR #4524:
URL:
https://github.com/apache/datafusion-comet/pull/4524#issuecomment-4585200647
Done. Added `columnar shuffle tolerates non-UTF-8 bytes in a StringType
column` to `CometColumnarShuffleSuite`. `cast(BinaryType -> StringType)` is a
zero-copy reinterpret, so the test puts genuinely non-UTF-8 bytes into a
`StringType` column, disables Comet's `Cast` so the raw bytes reach the JVM-row
to Arrow shuffle path (`process_sorted_row_partition` -> `get_string`), and
checks the answer against Spark.
I confirmed it's a real regression test: it passes with the fix, and when I
temporarily restored the strict `from_utf8(..).unwrap()` decode it fails with
`CometNativeException: ... Utf8Error { valid_up_to: 0, error_len: Some(1) }`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]