DO NOT REPLY [Bug 36290] New: - mutilates LATIN1 text files

bugzilla Sun, 21 Aug 2005 03:48:42 -0700

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG·
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=36290>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND·
INSERTED IN THE BUG DATABASE.


http://issues.apache.org/bugzilla/show_bug.cgi?id=36290

           Summary: <copy filtering="on"> mutilates LATIN1 text files
           Product: Ant
           Version: unspecified
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Core tasks
        AssignedTo: [email protected]
        ReportedBy: [EMAIL PROTECTED]


It is well documented that filtering may corrupt binary files.  I now found 
that a similar issue exists with text files.

When a text file with LATIN1 encoding is read assuming UTF-8 encoding, then a-
umlaut and other non-ASCII characters are replaced by '?' because these LATIN1 
byte values are not valid UTF-8 sequences.

Now this is what happens if this task

  <copy filtering="on" todir="bar">
    <fileset dir="foo">
      <include name="**/*.xml"/>
    </fileset>
  </copy>

is applied to XML files with encoding="iso-8859-1" on a platform with UTF-8 as 
default encoding.

The easy workaround is to set explicitly <copy filtering="on" todir="bar" 
encoding="iso-8859-1">.  This also copies correctly UTF-8 encoded files 
containing multi-byte character sequences.  Token replacement of ASCII strings 
also works correctly independent of the encoding.

My proposal is now to make this the default behaviour for the <copy> task:
If no explicit encoding is specified, do not use the platform dependent default 
encoding (which may be UTF-8) but always use iso-8859-1.

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

DO NOT REPLY [Bug 36290] New: - mutilates LATIN1 text files

Reply via email to