[ https://jira.codehaus.org/browse/MPH-87?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=298611#comment-298611 ]
Jürgen Hermann edited comment on MPH-87 at 5/14/12 6:07 AM: ------------------------------------------------------------ Writing the result to a file doesn't really help (then the file's content is broken, i.e. not well-formed XML). Consider this: {code} $ head -n1 pom.xml <?xml version="1.0"?> $ grep -m1 name pom.xml | xxd 0000000: 2020 2020 3c6e 616d 653e 4d75 6c74 692d <name>Multi- 0000010: 4172 6368 6574 7970 6573 2052 6f6f 7420 Archetypes Root 0000020: 504f 4d20 c3a4 c3b6 c3bc c39f 3c2f 6e61 POM ........</na 0000030: 6d65 3e0a me>. $ MAVEN_OPTS="-Dfile.encoding=iso-8859-15" mvn -Doutput=effective.xml help:effective-pom ... [INFO] Multi-Archetypes Root POM ��� ... $ head -n1 effective.xml <?xml version="1.0" encoding="UTF-8"?> $ xmllint effective.xml effective.xml:26: parser error : Input is not proper UTF-8, indicate encoding ! Bytes: 0xE4 0xF6 0xFC 0xDF <name>Multi-Archetypes Root POM ����</name> $ mvn -version Apache Maven 3.0.3 (r1075438; 2011-02-28 18:31:09+0100) Java version: 1.6.0_26, vendor: Sun Microsystems Inc. Default locale: en_US, platform encoding: ANSI_X3.4-1968 OS name: "linux", version: "3.0.0-12-generic-pae", arch: "i386", family: "unix" {code} i.e. we have a pom.xml with default encoding (UTF-8) containing some properly encoded umlauts (c3a4...). The Maven run (with simulating a system that uses Latin-9) already doesn't read that correctly and emits replacement characters. The resulting XML is a mess, stating *explicitely* it's UTF-8, while containing Latin-9. In summary: Maven doesn't behave deterministically here, and depends on the system environment where it shouldn't, leading to hard to find problems that occur "out of the blue" for some developers only. was (Author: jhermann): Writing the result to a file doesn't really help (then the file's content is broken, i.e. not well-formed XML). Consider this: {code} $ head -n1 pom.xml <?xml version="1.0"?> $ grep -m1 name pom.xml | xxd 0000000: 2020 2020 3c6e 616d 653e 4d75 6c74 692d <name>Multi- 0000010: 4172 6368 6574 7970 6573 2052 6f6f 7420 Archetypes Root 0000020: 504f 4d20 c3a4 c3b6 c3bc c39f 3c2f 6e61 POM ........</na 0000030: 6d65 3e0a me>. $ MAVEN_OPTS="-Dfile.encoding=iso-8859-15" mvn -Doutput=effective.xml help:effective-pom ... [INFO] Multi-Archetypes Root POM ��� ... $ head -n1 effective.xml <?xml version="1.0" encoding="UTF-8"?> $ xmllint effective.xml effective.xml:26: parser error : Input is not proper UTF-8, indicate encoding ! Bytes: 0xE4 0xF6 0xFC 0xDF <name>Multi-Archetypes Root POM ����</name> {code} i.e. we have a pom.xml with default encoding (UTF-8) containing some properly encoded umlauts (c3a4...). The Maven run (with simulating a system that uses Latin-9) already doesn't read that correctly and emits replacement characters. The resulting XML is a mess, stating *explicitely* it's UTF-8, while containing Latin-9. In summary: Maven doesn't behave deterministically here, and depends on the system environment where it shouldn't, leading to hard to find problems that occur "out of the blue" for some developers only. > help:effective-pom uses platform encoding and garbles non-ascii characters, > emits invalid XML > --------------------------------------------------------------------------------------------- > > Key: MPH-87 > URL: https://jira.codehaus.org/browse/MPH-87 > Project: Maven 2.x Help Plugin > Issue Type: Bug > Affects Versions: 2.1.1 > Environment: Windows, MacOSX, Linux, Maven 3.0.4 > Reporter: Mirko Friedenhagen > Attachments: mfriedenhagen-invalidpom-MPH-87-0-g42a5c31.zip > > > As stated in http://www.w3.org/TR/REC-xml/#sec-guessing-no-ext-info XML files > without a BOM and without a XML encoding declaration should read the XML as > UTF-8. > {{help:effective-pom}} does use the platform encoding for writing the > effective-pom without emitting an appropriate XML encoding declaration in the > resulting XML file. > I have created a small sample project (available at > https://github.com/mfriedenhagen/invalidpom, attached as ZIP) which will > reproduce the issue. > While the parent pom > (https://raw.github.com/mfriedenhagen/invalidpom/master/pom.xml) has a XML > encoding declaration, > https://raw.github.com/mfriedenhagen/invalidpom/master/child-invalid/pom.xml > has none. > Now running: > {code} > mvn -s settings.xml -gs settings.xml clean validate > {code} > will produce an invalid character for the developer name "Jörg" in > {{child-invalid}}. > Two workarounds are: > * to include a XML encoding declaration as done in {{child-valid}}. > * to use {{JAVA_TOOL_OPTIONS}} on Windows as stated in > http://stackoverflow.com/a/623036/49132 > * to use {{MAVEN_OPTS=-Dfile.encoding=utf-8 mvn -s settings.xml -gs > settings.xml clean validate}}. > Nonetheless I consider this a Major bug, as it clearly violates the > recommendations of W3C. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://jira.codehaus.org/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira