Laszlo Gaal created IMPALA-14100:
------------------------------------
Summary: critique-gerrit-review.py crashes with a codec exception
when reviewing a diff containing data with non-UTF-8 encoding
Key: IMPALA-14100
URL: https://issues.apache.org/jira/browse/IMPALA-14100
Project: IMPALA
Issue Type: Bug
Components: Infrastructure
Reporter: Laszlo Gaal
The precommit checker script {{bin/jenkins/critique-gerrit-review.py}} can
crash with the following Python traceback when the change diff contains data
with an encoding different from UTF-8. This can happen when prebuilt data files
are supplied with a patch, as it happened with
https://gerrit.cloudera.org/c/22049/ for example.
{code}
10:34:47.030720 git.c:439 trace: built-in: git diff -U0
HEAD^..HEAD
Traceback (most recent call last):
File
"/var/lib/jenkins/workspace/gerrit-auto-critic-test/Impala/bin/jenkins/critique-gerrit-review.py",
line 491, in <module>
merge_comments(comments, get_misc_comments(base_revision, revision,
args.dryrun))
File
"/var/lib/jenkins/workspace/gerrit-auto-critic-test/Impala/bin/jenkins/critique-gerrit-review.py",
line 209, in get_misc_comments
diff = check_output(["git", "diff", "-U0", "{0}..{1}".format(base_revision,
revision)],
File "/usr/lib/python3.8/subprocess.py", line 415, in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
File "/usr/lib/python3.8/subprocess.py", line 495, in run
stdout, stderr = process.communicate(input, timeout=timeout)
File "/usr/lib/python3.8/subprocess.py", line 1015, in communicate
stdout = self.stdout.read()
File "/usr/lib/python3.8/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc0 in position 34006:
invalid start byte
{code}
Excluding the problematic file(s) in
https://github.com/apache/impala/blob/f4e75510948bdb72f2d5206161fee12e5b6d0888/bin/jenkins/critique-gerrit-review.py#L68-L77
does not help, as the crash happens when processing the output of {{git
diff}}, which returnsa single output stream containing all the changes in all
the files.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)