[
https://jira.codehaus.org/browse/MPH-87?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=298611#comment-298611
]
Jürgen Hermann commented on MPH-87:
-----------------------------------
Writing the result to a file doesn't really help (then the file's content is
broken, i.e. not well-formed XML). Consider this:
{code}
$ head -n1 pom.xml
<?xml version="1.0"?>
$ grep -m1 name pom.xml | xxd
0000000: 2020 2020 3c6e 616d 653e 4d75 6c74 692d <name>Multi-
0000010: 4172 6368 6574 7970 6573 2052 6f6f 7420 Archetypes Root
0000020: 504f 4d20 c3a4 c3b6 c3bc c39f 3c2f 6e61 POM ........</na
0000030: 6d65 3e0a me>.
$ MAVEN_OPTS="-Dfile.encoding=iso-8859-15" mvn -Doutput=effective.xml
help:effective-pom
...
[INFO] Multi-Archetypes Root POM ���
...
$ head -n1 effective.xml
<?xml version="1.0" encoding="UTF-8"?>
$ xmllint effective.xml
effective.xml:26: parser error : Input is not proper UTF-8, indicate encoding !
Bytes: 0xE4 0xF6 0xFC 0xDF
<name>Multi-Archetypes Root POM ����</name>
{code}
i.e. we have a pom.xml with default encoding (UTF-8) containing some properly
encoded umlauts (c3a4...). The Maven run (with simulating a system that uses
Latin-9) already doesn't read that correctly and emits replacement characters.
The resulting XML is a mess, stating *explicitely* it's UTF-8, while containing
Latin-9.
In summary: Maven doesn't behave deterministically here, and depends on the
system environment where it shouldn't, leading to hard to find problems that
occur "out of the blue" for some developers only.
> help:effective-pom uses platform encoding and garbles non-ascii characters,
> emits invalid XML
> ---------------------------------------------------------------------------------------------
>
> Key: MPH-87
> URL: https://jira.codehaus.org/browse/MPH-87
> Project: Maven 2.x Help Plugin
> Issue Type: Bug
> Affects Versions: 2.1.1
> Environment: Windows, MacOSX, Linux, Maven 3.0.4
> Reporter: Mirko Friedenhagen
> Attachments: mfriedenhagen-invalidpom-MPH-87-0-g42a5c31.zip
>
>
> As stated in http://www.w3.org/TR/REC-xml/#sec-guessing-no-ext-info XML files
> without a BOM and without a XML encoding declaration should read the XML as
> UTF-8.
> {{help:effective-pom}} does use the platform encoding for writing the
> effective-pom without emitting an appropriate XML encoding declaration in the
> resulting XML file.
> I have created a small sample project (available at
> https://github.com/mfriedenhagen/invalidpom, attached as ZIP) which will
> reproduce the issue.
> While the parent pom
> (https://raw.github.com/mfriedenhagen/invalidpom/master/pom.xml) has a XML
> encoding declaration,
> https://raw.github.com/mfriedenhagen/invalidpom/master/child-invalid/pom.xml
> has none.
> Now running:
> {code}
> mvn -s settings.xml -gs settings.xml clean validate
> {code}
> will produce an invalid character for the developer name "Jörg" in
> {{child-invalid}}.
> Two workarounds are:
> * to include a XML encoding declaration as done in {{child-valid}}.
> * to use {{JAVA_TOOL_OPTIONS}} on Windows as stated in
> http://stackoverflow.com/a/623036/49132
> * to use {{MAVEN_OPTS=-Dfile.encoding=utf-8 mvn -s settings.xml -gs
> settings.xml clean validate}}.
> Nonetheless I consider this a Major bug, as it clearly violates the
> recommendations of W3C.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://jira.codehaus.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira