| Issue |
175017
|
| Summary |
[llvm-lit] Internal shell breaks tests with Unicode filenames on non-UTF-8 locales
|
| Labels |
new issue
|
| Assignees |
|
| Reporter |
dominik-steenken
|
The following tests:
- `LLVM::mri-nonascii.test`
- `LLVM::delimiters.test`
- `LLVM::lit-unicode.txt`
- `LLVM::response-utf-8.test`
fail on non-UTF-8 locales. To reproduce (e.g. on `mri-nonascii.test`):
```
$ LANG=en_US.latin1 llvm-lit -vv llvm/test/tools/llvm-ar/mri-nonascii.test
```
Observed behavior:
```
UNRESOLVED: LLVM :: tools/llvm-ar/mri-nonascii.test (1 of 1)
******************** TEST 'LLVM :: tools/llvm-ar/mri-nonascii.test' FAILED ********************
Exception during script execution:
Traceback (most recent call last):
File "/home/dost/workspace/llvm-project/llvm/utils/lit/lit/worker.py", line 77, in _execute_test_handle_errors
result = test.config.test_format.execute(test, lit_config)
File "/home/dost/workspace/llvm-project/llvm/utils/lit/lit/formats/shtest.py", line 29, in execute
return lit.TestRunner.executeShTest(
~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
test,
^^^^^
...<3 lines>...
self.preamble_commands,
^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "/home/dost/workspace/llvm-project/llvm/utils/lit/lit/TestRunner.py", line 2482, in executeShTest
return _runShTest(test, litConfig, useExternalSh, script, tmpBase)
File "/home/dost/workspace/llvm-project/llvm/utils/lit/lit/TestRunner.py", line 2404, in _runShTest
res = runOnce(execdir)
[...]
File "/home/dost/workspace/llvm-project/llvm/utils/lit/lit/TestRunner.py", line 778, in _executeShCmd
res = _executeShCmd(cmd.rhs, shenv, results, timeoutHelper)
File "/home/dost/workspace/llvm-project/llvm/utils/lit/lit/TestRunner.py", line 778, in _executeShCmd
res = _executeShCmd(cmd.rhs, shenv, results, timeoutHelper)
File "/home/dost/workspace/llvm-project/llvm/utils/lit/lit/TestRunner.py", line 888, in _executeShCmd
result = inproc_builtin(Command(args, j.redirects), cmd_shenv)
File "/home/dost/workspace/llvm-project/llvm/utils/lit/lit/TestRunner.py", line 384, in executeBuiltinEcho
stdin, stdout, stderr = processRedirects(cmd, subprocess.PIPE, shenv, opened_files)
~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/dost/workspace/llvm-project/llvm/utils/lit/lit/TestRunner.py", line 713, in processRedirects
fd = open(redir_filename, mode, encoding="utf-8")
UnicodeEncodeError: 'latin-1' codec can't encode character '\u20ac' in position 88: ordinal not in range(256)
```
I had a ci runner with a locale like that, and observed the tests starting to fail late last year. Bisecting showed that the commit that started the behavior was
```
2581354f6cdf [llvm] Use lit internal shell by default
```
These tests all include Unicode characters in the actual test file, like £, €, or ⦙.
What follows is my attempt at tracing the root cause.
Previously, with the external shell being the default, these files were read by Python (with an assumed `utf-8` encoding, i think), and then for each individual `RUN` line, a small shell script was written, again with `utf-8` encoding. This file was then handed to `/bin/sh` with a `subprocess.Popen` call, which itself only used ASCII characters.
With the new default, the internal shell, the `RUN` lines are split into their individual arguments, which are then handled directly by `lit`. The handling that trips up the above test case involves file redirects. The code attempts to open the file that is being redirected to, e.g. `€.txt`, and fails since the file name contains characters that are not valid by what the Python presumes to be the file name encoding (which it gets from the locale that is set, i think).
One possible fix would be to handle redirect filenames as bytes rather than strings, bypassing the locale-dependent encoding. However, I'm not familiar enough with the internal shell's design goals to know if that's the right approach, or if non-UTF-8 locales are even a supported configuration.
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs