This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 001263f58a5275e188bd57be68f76cb271cd7992
Author: Joe McDonnell <[email protected]>
AuthorDate: Sun Oct 26 13:39:28 2025 -0700

    IMPALA-14514: Handle serializing bytes in bin/run-workload.py
    
    On python 3, when Impyla receives a result with a string that is
    not valid UTF-8, it returns that as bytes. TPC-DS Q30 on scale 20
    has a result that contains invalid UTF-8, so bin/run-workload.py
    can fail while trying to dump this to JSON.
    
    This modifies CustomJSONEncoder to handle serializing bytes by
    converting it to a string with invalid unicode handled with
    backslashes.
    
    Testing:
     - Ran bin/run-workload.py against TPC-DS scale 20
    
    Change-Id: Ibe31c656de4fc65f8580c7b3b49bf655b8a5ecea
    Reviewed-on: http://gerrit.cloudera.org:8080/23602
    Reviewed-by: Riza Suminto <[email protected]>
    Reviewed-by: Jason Fehr <[email protected]>
    Tested-by: Joe McDonnell <[email protected]>
---
 bin/run-workload.py | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/bin/run-workload.py b/bin/run-workload.py
index 78d118ee2..b99b2d41a 100755
--- a/bin/run-workload.py
+++ b/bin/run-workload.py
@@ -145,6 +145,11 @@ class CustomJSONEncoder(json.JSONEncoder):
     if isinstance(obj, datetime):
       # Convert datetime into an standard iso string
       return obj.isoformat()
+    if isinstance(obj, bytes):
+      # Impyla can leave a string value as bytes when it is unable to decode 
it to UTF-8.
+      # TPC-DS has queries that produce non-UTF-8 results (e.g. Q30 on scale 
20)
+      # Convert bytes to strings to make JSON encoding work
+      return obj.decode(encoding="utf-8", errors="backslashreplace")
     elif isinstance(obj, (Query, HiveQueryResult, QueryExecConfig, 
TableFormatInfo)):
       # Serialize these objects manually by returning their __dict__ methods.
       return obj.__dict__

Reply via email to