[
https://issues.apache.org/jira/browse/IMPALA-14514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Joe McDonnell resolved IMPALA-14514.
------------------------------------
Fix Version/s: Impala 5.0.0
Assignee: Joe McDonnell
Resolution: Fixed
> bin-workload.py needs to handle serializing bytes / invalid UTF-8 to JSON
> -------------------------------------------------------------------------
>
> Key: IMPALA-14514
> URL: https://issues.apache.org/jira/browse/IMPALA-14514
> Project: IMPALA
> Issue Type: Bug
> Components: Infrastructure
> Affects Versions: Impala 5.0.0
> Reporter: Joe McDonnell
> Assignee: Joe McDonnell
> Priority: Major
> Fix For: Impala 5.0.0
>
>
> On python 3, when Impyla receives a result with a string that is not valid
> UTF-8, it returns that as bytes. TPC-DS Q30 on scale 20 has a result that
> contain invalid UTF-8, so bin/run-workload.py can fail while trying to dump
> this to JSON:
> {noformat}
> 18:49:20 Traceback (most recent call last):
> 18:49:20 File "/home/ubuntu/Impala/bin/run-workload.py", line 289, in
> <module>
> 18:49:20 json.dump(result_map, f, cls=CustomJSONEncoder,
> ensure_ascii=False)
> 18:49:20 File
> "/home/ubuntu/Impala/toolchain/toolchain-packages-gcc10.4.0/python-3.8.18/lib/python3.8/json/__init__.py",
> line 179, in dump
> 18:49:20 for chunk in iterable:
> 18:49:20 File
> "/home/ubuntu/Impala/toolchain/toolchain-packages-gcc10.4.0/python-3.8.18/lib/python3.8/json/encoder.py",
> line 431, in _iterencode
> 18:49:20 yield from _iterencode_dict(o, _current_indent_level)
> 18:49:20 File
> "/home/ubuntu/Impala/toolchain/toolchain-packages-gcc10.4.0/python-3.8.18/lib/python3.8/json/encoder.py",
> line 405, in _iterencode_dict
> 18:49:20 yield from chunks
> 18:49:20 File
> "/home/ubuntu/Impala/toolchain/toolchain-packages-gcc10.4.0/python-3.8.18/lib/python3.8/json/encoder.py",
> line 325, in _iterencode_list
> 18:49:20 yield from chunks
> 18:49:20 File
> "/home/ubuntu/Impala/toolchain/toolchain-packages-gcc10.4.0/python-3.8.18/lib/python3.8/json/encoder.py",
> line 439, in _iterencode
> 18:49:20 yield from _iterencode(o, _current_indent_level)
> 18:49:20 File
> "/home/ubuntu/Impala/toolchain/toolchain-packages-gcc10.4.0/python-3.8.18/lib/python3.8/json/encoder.py",
> line 431, in _iterencode
> 18:49:20 yield from _iterencode_dict(o, _current_indent_level)
> 18:49:20 File
> "/home/ubuntu/Impala/toolchain/toolchain-packages-gcc10.4.0/python-3.8.18/lib/python3.8/json/encoder.py",
> line 405, in _iterencode_dict
> 18:49:20 yield from chunks
> 18:49:20 File
> "/home/ubuntu/Impala/toolchain/toolchain-packages-gcc10.4.0/python-3.8.18/lib/python3.8/json/encoder.py",
> line 325, in _iterencode_list
> 18:49:20 yield from chunks
> 18:49:20 File
> "/home/ubuntu/Impala/toolchain/toolchain-packages-gcc10.4.0/python-3.8.18/lib/python3.8/json/encoder.py",
> line 325, in _iterencode_list
> 18:49:20 yield from chunks
> 18:49:20 File
> "/home/ubuntu/Impala/toolchain/toolchain-packages-gcc10.4.0/python-3.8.18/lib/python3.8/json/encoder.py",
> line 438, in _iterencode
> 18:49:20 o = _default(o)
> 18:49:20 File "/home/ubuntu/Impala/bin/run-workload.py", line 152, in
> default
> 18:49:20 super(CustomJSONEncoder, self).default(obj)
> 18:49:20 File
> "/home/ubuntu/Impala/toolchain/toolchain-packages-gcc10.4.0/python-3.8.18/lib/python3.8/json/encoder.py",
> line 179, in default
> 18:49:20 raise TypeError(f'Object of type {o.__class__.__name__}
> '{noformat}
> We should change CustomJSONEncoder to handle bytes.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)