Non-ASCII file names in git

Gavin Smith Sun, 17 Dec 2023 11:21:10 -0800

On Sat, Oct 14, 2023 at 09:41:46AM +0300, Eli Zaretskii wrote:
> > Eli, are you able to test this from git or do you need me to make another
> > pretest release?
> 
> Git is a bit problematic, as some of the file names include non-ASCII
> characters.  For this reason, and also for others (e.g., I have
> already made too many changes to 7.0.93 sources), I'd prefer another
> pretest.


The test suite tests file names with non-ASCII characters for the
purpose of testing encoding issues and better supporting use of different
character encodings.

However, this has the ironic effect of making the Texinfo source code
*less* accessible, especially on MS-Windows, or any system where non-ASCII
bytes in file names downloaded from git cause an issue.  It seems that
these systems are the ones where character encoding problems would most
likely occur, so we would want users to be testing and reporting any
such problems.

Running

$ LC_ALL=C find . -name '*[^ -~]*

to find files with non-ASCII names, all the results are under tp/tests.
Hence, it would be straightforward to eliminate such file names, with
the cost of also eliminating these tests.

Since these tests contain these characters in reference output file names,
there would seem to be no simple way to adjust them to work with fully-ASCII
file names.  (The only possibility to me appears to be to store them in
git in some kind of coded form, and then to decode them when the tests run.
For example, as a tar archive file with an ASCII file name.  This would
be some work to implement, of course.)

For example:

$ find ./tp/tests/encoded/res_parser/non_ascii_test_epub
./tp/tests/encoded/res_parser/non_ascii_test_epub
./tp/tests/encoded/res_parser/non_ascii_test_epub/osé_utf8.2
./tp/tests/encoded/res_parser/non_ascii_test_epub/osé_utf8.1
./tp/tests/encoded/res_parser/non_ascii_test_epub/osé_utf8_epub_package
./tp/tests/encoded/res_parser/non_ascii_test_epub/osé_utf8_epub_package/EPUB
./tp/tests/encoded/res_parser/non_ascii_test_epub/osé_utf8_epub_package/EPUB/xhtml
./tp/tests/encoded/res_parser/non_ascii_test_epub/osé_utf8_epub_package/EPUB/xhtml/osé_utf8.xhtml
./tp/tests/encoded/res_parser/non_ascii_test_epub/osé_utf8_epub_package/EPUB/xhtml/Chapteur.xhtml
./tp/tests/encoded/res_parser/non_ascii_test_epub/osé_utf8_epub_package/EPUB/xhtml/nav_toc.xhtml
./tp/tests/encoded/res_parser/non_ascii_test_epub/osé_utf8_epub_package/EPUB/images
./tp/tests/encoded/res_parser/non_ascii_test_epub/osé_utf8_epub_package/EPUB/images/2-an_image.png
./tp/tests/encoded/res_parser/non_ascii_test_epub/osé_utf8_epub_package/EPUB/images/1-an_image.png
./tp/tests/encoded/res_parser/non_ascii_test_epub/osé_utf8_epub_package/EPUB/osé_utf8.opf
./tp/tests/encoded/res_parser/non_ascii_test_epub/osé_utf8_epub_package/mimetype
./tp/tests/encoded/res_parser/non_ascii_test_epub/osé_utf8_epub_package/META-INF
./tp/tests/encoded/res_parser/non_ascii_test_epub/osé_utf8_epub_package/META-INF/container.xml

The question is how beneficial it would be to have wholly ASCII file names
for all files tracked in git.  By default, we will keep these tests, but
could decide otherwise if there is feedback from users of systems where
they cause an issue.

Non-ASCII file names in git

Reply via email to