Joe McDonnell created IMPALA-14514:
--------------------------------------

             Summary: bin-workload.py needs to handle serializing bytes / 
invalid UTF-8 to JSON
                 Key: IMPALA-14514
                 URL: https://issues.apache.org/jira/browse/IMPALA-14514
             Project: IMPALA
          Issue Type: Bug
          Components: Infrastructure
    Affects Versions: Impala 5.0.0
            Reporter: Joe McDonnell


On python 3, when Impyla receives a result with a string that is not valid 
UTF-8, it returns that as bytes. TPC-DS Q30 on scale 20 has a result that 
contain invalid UTF-8, so bin/run-workload.py can fail while trying to dump 
this to JSON:
{noformat}
18:49:20 Traceback (most recent call last):
18:49:20   File "/home/ubuntu/Impala/bin/run-workload.py", line 289, in <module>
18:49:20     json.dump(result_map, f, cls=CustomJSONEncoder, ensure_ascii=False)
18:49:20   File 
"/home/ubuntu/Impala/toolchain/toolchain-packages-gcc10.4.0/python-3.8.18/lib/python3.8/json/__init__.py",
 line 179, in dump
18:49:20     for chunk in iterable:
18:49:20   File 
"/home/ubuntu/Impala/toolchain/toolchain-packages-gcc10.4.0/python-3.8.18/lib/python3.8/json/encoder.py",
 line 431, in _iterencode
18:49:20     yield from _iterencode_dict(o, _current_indent_level)
18:49:20   File 
"/home/ubuntu/Impala/toolchain/toolchain-packages-gcc10.4.0/python-3.8.18/lib/python3.8/json/encoder.py",
 line 405, in _iterencode_dict
18:49:20     yield from chunks
18:49:20   File 
"/home/ubuntu/Impala/toolchain/toolchain-packages-gcc10.4.0/python-3.8.18/lib/python3.8/json/encoder.py",
 line 325, in _iterencode_list
18:49:20     yield from chunks
18:49:20   File 
"/home/ubuntu/Impala/toolchain/toolchain-packages-gcc10.4.0/python-3.8.18/lib/python3.8/json/encoder.py",
 line 439, in _iterencode
18:49:20     yield from _iterencode(o, _current_indent_level)
18:49:20   File 
"/home/ubuntu/Impala/toolchain/toolchain-packages-gcc10.4.0/python-3.8.18/lib/python3.8/json/encoder.py",
 line 431, in _iterencode
18:49:20     yield from _iterencode_dict(o, _current_indent_level)
18:49:20   File 
"/home/ubuntu/Impala/toolchain/toolchain-packages-gcc10.4.0/python-3.8.18/lib/python3.8/json/encoder.py",
 line 405, in _iterencode_dict
18:49:20     yield from chunks
18:49:20   File 
"/home/ubuntu/Impala/toolchain/toolchain-packages-gcc10.4.0/python-3.8.18/lib/python3.8/json/encoder.py",
 line 325, in _iterencode_list
18:49:20     yield from chunks
18:49:20   File 
"/home/ubuntu/Impala/toolchain/toolchain-packages-gcc10.4.0/python-3.8.18/lib/python3.8/json/encoder.py",
 line 325, in _iterencode_list
18:49:20     yield from chunks
18:49:20   File 
"/home/ubuntu/Impala/toolchain/toolchain-packages-gcc10.4.0/python-3.8.18/lib/python3.8/json/encoder.py",
 line 438, in _iterencode
18:49:20     o = _default(o)
18:49:20   File "/home/ubuntu/Impala/bin/run-workload.py", line 152, in default
18:49:20     super(CustomJSONEncoder, self).default(obj)
18:49:20   File 
"/home/ubuntu/Impala/toolchain/toolchain-packages-gcc10.4.0/python-3.8.18/lib/python3.8/json/encoder.py",
 line 179, in default
18:49:20     raise TypeError(f'Object of type {o.__class__.__name__} '{noformat}
We should change CustomJSONEncoder to handle bytes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to