[ 
https://issues.apache.org/jira/browse/IMPALA-14514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-14514.
------------------------------------
    Fix Version/s: Impala 5.0.0
         Assignee: Joe McDonnell
       Resolution: Fixed

> bin-workload.py needs to handle serializing bytes / invalid UTF-8 to JSON
> -------------------------------------------------------------------------
>
>                 Key: IMPALA-14514
>                 URL: https://issues.apache.org/jira/browse/IMPALA-14514
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Infrastructure
>    Affects Versions: Impala 5.0.0
>            Reporter: Joe McDonnell
>            Assignee: Joe McDonnell
>            Priority: Major
>             Fix For: Impala 5.0.0
>
>
> On python 3, when Impyla receives a result with a string that is not valid 
> UTF-8, it returns that as bytes. TPC-DS Q30 on scale 20 has a result that 
> contain invalid UTF-8, so bin/run-workload.py can fail while trying to dump 
> this to JSON:
> {noformat}
> 18:49:20 Traceback (most recent call last):
> 18:49:20   File "/home/ubuntu/Impala/bin/run-workload.py", line 289, in 
> <module>
> 18:49:20     json.dump(result_map, f, cls=CustomJSONEncoder, 
> ensure_ascii=False)
> 18:49:20   File 
> "/home/ubuntu/Impala/toolchain/toolchain-packages-gcc10.4.0/python-3.8.18/lib/python3.8/json/__init__.py",
>  line 179, in dump
> 18:49:20     for chunk in iterable:
> 18:49:20   File 
> "/home/ubuntu/Impala/toolchain/toolchain-packages-gcc10.4.0/python-3.8.18/lib/python3.8/json/encoder.py",
>  line 431, in _iterencode
> 18:49:20     yield from _iterencode_dict(o, _current_indent_level)
> 18:49:20   File 
> "/home/ubuntu/Impala/toolchain/toolchain-packages-gcc10.4.0/python-3.8.18/lib/python3.8/json/encoder.py",
>  line 405, in _iterencode_dict
> 18:49:20     yield from chunks
> 18:49:20   File 
> "/home/ubuntu/Impala/toolchain/toolchain-packages-gcc10.4.0/python-3.8.18/lib/python3.8/json/encoder.py",
>  line 325, in _iterencode_list
> 18:49:20     yield from chunks
> 18:49:20   File 
> "/home/ubuntu/Impala/toolchain/toolchain-packages-gcc10.4.0/python-3.8.18/lib/python3.8/json/encoder.py",
>  line 439, in _iterencode
> 18:49:20     yield from _iterencode(o, _current_indent_level)
> 18:49:20   File 
> "/home/ubuntu/Impala/toolchain/toolchain-packages-gcc10.4.0/python-3.8.18/lib/python3.8/json/encoder.py",
>  line 431, in _iterencode
> 18:49:20     yield from _iterencode_dict(o, _current_indent_level)
> 18:49:20   File 
> "/home/ubuntu/Impala/toolchain/toolchain-packages-gcc10.4.0/python-3.8.18/lib/python3.8/json/encoder.py",
>  line 405, in _iterencode_dict
> 18:49:20     yield from chunks
> 18:49:20   File 
> "/home/ubuntu/Impala/toolchain/toolchain-packages-gcc10.4.0/python-3.8.18/lib/python3.8/json/encoder.py",
>  line 325, in _iterencode_list
> 18:49:20     yield from chunks
> 18:49:20   File 
> "/home/ubuntu/Impala/toolchain/toolchain-packages-gcc10.4.0/python-3.8.18/lib/python3.8/json/encoder.py",
>  line 325, in _iterencode_list
> 18:49:20     yield from chunks
> 18:49:20   File 
> "/home/ubuntu/Impala/toolchain/toolchain-packages-gcc10.4.0/python-3.8.18/lib/python3.8/json/encoder.py",
>  line 438, in _iterencode
> 18:49:20     o = _default(o)
> 18:49:20   File "/home/ubuntu/Impala/bin/run-workload.py", line 152, in 
> default
> 18:49:20     super(CustomJSONEncoder, self).default(obj)
> 18:49:20   File 
> "/home/ubuntu/Impala/toolchain/toolchain-packages-gcc10.4.0/python-3.8.18/lib/python3.8/json/encoder.py",
>  line 179, in default
> 18:49:20     raise TypeError(f'Object of type {o.__class__.__name__} 
> '{noformat}
> We should change CustomJSONEncoder to handle bytes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to