SUSv3-2008 has a standard way of addressing this issue. I don't have the current reference handy, but it was first raised on the Austin group mailing list a few years ago. Search for 'hdrcharset' in:
http://www.opengroup.org/austin/aardvark/latest/xcubug2.txt With this, you can store the filename as a binary string instead of utf8 if the utf8 conversion fails. This is supported by bsdtar/libarchive, by Heirloom tar, and (I believe) by star. Implementing this correctly is a little tricky, since the hdrcharset option applies to all headers, not just the pathname header. So you basically have to try to convert everything to UTF8 and if anything fails, use hdrcharset=binary. Tim Lars Gustäbel wrote:
Hello! I noticed that tar with the --format=pax option will produce invalid archives when it encounters filenames that it cannot convert from the user's charset to utf8. In src/xheader.c the function code_string simply copies the input string to the output string in event of an error during utf8_convert. The problem seems to be known - there is a FIXME tag in the comment. However, I'd favor the solution from the comment (report an error) instead of silently producing a corrupt archive. Regards,
