On 2009-05-19, aborisevich <aborisev...@beldts.de> wrote: > I have found the next bug using present org.apache.tools.tar > package. Tar Archive was created on one system (for example Windows > XP - default charset CP-1251). This tar archive contains TarEntries > were named with using national characters like German umlauts. Than > this archive file was copied on Linux system (default charset UTF-8) > - after unpackin this archive file there - information was lost > (TarEntries names were lost). There is possible solution for this > problem.
While I agree that the current handling is unfortunate, your solution (UTF-8 encoding file names) probably doesn't really help either. There are various dialects of the tar file format and your solution would create yet another one only extractable by Ant. So far I haven't found a description of the latest POSIX tar format but if you follow the public information of older formats they are extraordinarily vague about file names that contain non-ASCII characters. There simply doesn't seem to be a common method to encode them. If you take BSD's tar(5) man page (e.g. <http://leaf.dragonflybsd.org/cgi/web-man?command=tar§ion=5>) you'll see in the Pax section that pax specifically puts non-ASCII file names into a separate entry and the manual points out this could hold non-ASCII characters (which sort of implies the "normal" name part was ASCII only). I don't think there is a real solution. Implementing POSIX 2001 compliance in the tar package and making Ant use that (at the whim of a user option) would be a long term plan, though. Stefan --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@ant.apache.org For additional commands, e-mail: dev-h...@ant.apache.org