DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUGĀ· RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT <http://issues.apache.org/bugzilla/show_bug.cgi?id=36290>. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED ANDĀ· INSERTED IN THE BUG DATABASE.
http://issues.apache.org/bugzilla/show_bug.cgi?id=36290 Summary: <copy filtering="on"> mutilates LATIN1 text files Product: Ant Version: unspecified Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Core tasks AssignedTo: dev@ant.apache.org ReportedBy: [EMAIL PROTECTED] It is well documented that filtering may corrupt binary files. I now found that a similar issue exists with text files. When a text file with LATIN1 encoding is read assuming UTF-8 encoding, then a- umlaut and other non-ASCII characters are replaced by '?' because these LATIN1 byte values are not valid UTF-8 sequences. Now this is what happens if this task <copy filtering="on" todir="bar"> <fileset dir="foo"> <include name="**/*.xml"/> </fileset> </copy> is applied to XML files with encoding="iso-8859-1" on a platform with UTF-8 as default encoding. The easy workaround is to set explicitly <copy filtering="on" todir="bar" encoding="iso-8859-1">. This also copies correctly UTF-8 encoded files containing multi-byte character sequences. Token replacement of ASCII strings also works correctly independent of the encoding. My proposal is now to make this the default behaviour for the <copy> task: If no explicit encoding is specified, do not use the platform dependent default encoding (which may be UTF-8) but always use iso-8859-1. -- Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]