On 2009-05-19, aborisevich <aborisev...@beldts.de> wrote:

> I have found the next bug using present org.apache.tools.tar
> package. Tar Archive was created on one system (for example Windows
> XP - default charset CP-1251). This tar archive contains TarEntries
> were named with using national characters like German umlauts. Than
> this archive file was copied on Linux system (default charset UTF-8)
> - after unpackin this archive file there - information was lost
> (TarEntries names were lost). There is possible solution for this
> problem.

While I agree that the current handling is unfortunate, your solution
(UTF-8 encoding file names) probably doesn't really help either.

There are various dialects of the tar file format and your solution
would create yet another one only extractable by Ant.

So far I haven't found a description of the latest POSIX tar format
but if you follow the public information of older formats they are
extraordinarily vague about file names that contain non-ASCII
characters.  There simply doesn't seem to be a common method to encode
them.

If you take BSD's tar(5) man page
(e.g. <http://leaf.dragonflybsd.org/cgi/web-man?command=tar&section=5>)
you'll see in the Pax section that pax specifically puts non-ASCII
file names into a separate entry and the manual points out this could
hold non-ASCII characters (which sort of implies the "normal" name
part was ASCII only).

I don't think there is a real solution.  Implementing POSIX 2001
compliance in the tar package and making Ant use that (at the whim of
a user option) would be a long term plan, though.

Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@ant.apache.org
For additional commands, e-mail: dev-h...@ant.apache.org

Reply via email to