[ 
https://jira.codehaus.org/browse/MPH-87?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=298611#comment-298611
 ] 

Jürgen Hermann commented on MPH-87:
-----------------------------------

Writing the result to a file doesn't really help (then the file's content is 
broken, i.e. not well-formed XML). Consider this:
{code}
$ head -n1 pom.xml
<?xml version="1.0"?>

$ grep -m1 name pom.xml | xxd
0000000: 2020 2020 3c6e 616d 653e 4d75 6c74 692d      <name>Multi-
0000010: 4172 6368 6574 7970 6573 2052 6f6f 7420  Archetypes Root 
0000020: 504f 4d20 c3a4 c3b6 c3bc c39f 3c2f 6e61  POM ........</na
0000030: 6d65 3e0a                                me>.

$ MAVEN_OPTS="-Dfile.encoding=iso-8859-15" mvn -Doutput=effective.xml 
help:effective-pom 
...
[INFO] Multi-Archetypes Root POM &#65533;&#65533;&#65533;
...

$ head -n1 effective.xml 
<?xml version="1.0" encoding="UTF-8"?>

$ xmllint effective.xml 
effective.xml:26: parser error : Input is not proper UTF-8, indicate encoding !
Bytes: 0xE4 0xF6 0xFC 0xDF
    <name>Multi-Archetypes Root POM &#65533;&#65533;&#65533;&#65533;</name>
{code}
i.e. we have a pom.xml with default encoding (UTF-8) containing some properly 
encoded umlauts (c3a4...). The Maven run (with simulating a system that uses 
Latin-9) already doesn't read that correctly and emits replacement characters. 
The resulting XML is a mess, stating *explicitely* it's UTF-8, while containing 
Latin-9.

In summary: Maven doesn't behave deterministically here, and depends on the 
system environment where it shouldn't, leading to hard to find problems that 
occur "out of the blue" for some developers only.
                
> help:effective-pom uses platform encoding and garbles non-ascii characters, 
> emits invalid XML
> ---------------------------------------------------------------------------------------------
>
>                 Key: MPH-87
>                 URL: https://jira.codehaus.org/browse/MPH-87
>             Project: Maven 2.x Help Plugin
>          Issue Type: Bug
>    Affects Versions: 2.1.1
>         Environment: Windows, MacOSX, Linux, Maven 3.0.4
>            Reporter: Mirko Friedenhagen
>         Attachments: mfriedenhagen-invalidpom-MPH-87-0-g42a5c31.zip
>
>
> As stated in http://www.w3.org/TR/REC-xml/#sec-guessing-no-ext-info XML files 
> without a BOM and without a XML encoding declaration should read the XML as 
> UTF-8. 
> {{help:effective-pom}} does use the platform encoding for writing the 
> effective-pom without emitting an appropriate XML encoding declaration in the 
> resulting XML file.
> I have created a small sample project (available at 
> https://github.com/mfriedenhagen/invalidpom, attached as ZIP) which will 
> reproduce the issue.
> While the parent pom 
> (https://raw.github.com/mfriedenhagen/invalidpom/master/pom.xml) has a XML 
> encoding declaration, 
> https://raw.github.com/mfriedenhagen/invalidpom/master/child-invalid/pom.xml 
> has none.
> Now running:
> {code}
> mvn -s settings.xml -gs settings.xml clean validate
> {code}
> will produce an invalid character for the developer name "Jörg" in 
> {{child-invalid}}. 
> Two workarounds are:
> * to include a XML encoding declaration as done in {{child-valid}}. 
> * to use {{JAVA_TOOL_OPTIONS}} on Windows as stated in 
> http://stackoverflow.com/a/623036/49132
> * to use {{MAVEN_OPTS=-Dfile.encoding=utf-8 mvn -s settings.xml -gs 
> settings.xml clean validate}}.
> Nonetheless I consider this a Major bug, as it clearly violates the 
> recommendations of W3C.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://jira.codehaus.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to