[ https://jira.codehaus.org/browse/MPH-87?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=298611#comment-298611 ]
Jürgen Hermann commented on MPH-87: ----------------------------------- Writing the result to a file doesn't really help (then the file's content is broken, i.e. not well-formed XML). Consider this: {code} $ head -n1 pom.xml <?xml version="1.0"?> $ grep -m1 name pom.xml | xxd 0000000: 2020 2020 3c6e 616d 653e 4d75 6c74 692d <name>Multi- 0000010: 4172 6368 6574 7970 6573 2052 6f6f 7420 Archetypes Root 0000020: 504f 4d20 c3a4 c3b6 c3bc c39f 3c2f 6e61 POM ........</na 0000030: 6d65 3e0a me>. $ MAVEN_OPTS="-Dfile.encoding=iso-8859-15" mvn -Doutput=effective.xml help:effective-pom ... [INFO] Multi-Archetypes Root POM ��� ... $ head -n1 effective.xml <?xml version="1.0" encoding="UTF-8"?> $ xmllint effective.xml effective.xml:26: parser error : Input is not proper UTF-8, indicate encoding ! Bytes: 0xE4 0xF6 0xFC 0xDF <name>Multi-Archetypes Root POM ����</name> {code} i.e. we have a pom.xml with default encoding (UTF-8) containing some properly encoded umlauts (c3a4...). The Maven run (with simulating a system that uses Latin-9) already doesn't read that correctly and emits replacement characters. The resulting XML is a mess, stating *explicitely* it's UTF-8, while containing Latin-9. In summary: Maven doesn't behave deterministically here, and depends on the system environment where it shouldn't, leading to hard to find problems that occur "out of the blue" for some developers only. > help:effective-pom uses platform encoding and garbles non-ascii characters, > emits invalid XML > --------------------------------------------------------------------------------------------- > > Key: MPH-87 > URL: https://jira.codehaus.org/browse/MPH-87 > Project: Maven 2.x Help Plugin > Issue Type: Bug > Affects Versions: 2.1.1 > Environment: Windows, MacOSX, Linux, Maven 3.0.4 > Reporter: Mirko Friedenhagen > Attachments: mfriedenhagen-invalidpom-MPH-87-0-g42a5c31.zip > > > As stated in http://www.w3.org/TR/REC-xml/#sec-guessing-no-ext-info XML files > without a BOM and without a XML encoding declaration should read the XML as > UTF-8. > {{help:effective-pom}} does use the platform encoding for writing the > effective-pom without emitting an appropriate XML encoding declaration in the > resulting XML file. > I have created a small sample project (available at > https://github.com/mfriedenhagen/invalidpom, attached as ZIP) which will > reproduce the issue. > While the parent pom > (https://raw.github.com/mfriedenhagen/invalidpom/master/pom.xml) has a XML > encoding declaration, > https://raw.github.com/mfriedenhagen/invalidpom/master/child-invalid/pom.xml > has none. > Now running: > {code} > mvn -s settings.xml -gs settings.xml clean validate > {code} > will produce an invalid character for the developer name "Jörg" in > {{child-invalid}}. > Two workarounds are: > * to include a XML encoding declaration as done in {{child-valid}}. > * to use {{JAVA_TOOL_OPTIONS}} on Windows as stated in > http://stackoverflow.com/a/623036/49132 > * to use {{MAVEN_OPTS=-Dfile.encoding=utf-8 mvn -s settings.xml -gs > settings.xml clean validate}}. > Nonetheless I consider this a Major bug, as it clearly violates the > recommendations of W3C. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://jira.codehaus.org/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira