Hi folks,

I'd like to know if we have a general concensus on this:

I am investigating MPIR-242 and figured out the cause. The input stream is obtained from the HTTP URL and no encoding is given, so ISO-8859-1 is provided as default (yuck!). While I know that some reporting related modules have default values for input/output encoding, this contradicts our general approach to use platform encoding when project.build.sourceEncoding is not given.

In that special case, the behavior would be consistent if changed. Setting project.build.sourceEncoding to UTF-8 would solve the problem but is just a workaround. HTML resources carry their encoding with them but the ProjectInfoReportUtils treats everything as input streams (not helpful with XML/HTML). I would really like to avoid peeking with a pushback input stream.

How is your opinion on this?

I have two solutions in mind for the issue above:

1. Easy: remove ISO-8859-1, assume platform encoding if project.build.sourceEncoding is not provided. 2. Complex: use an HTML parser (JSoup is awesome and license-compatible [1]) to get correctly encoded content. But how do you know that this URL really points to an HTML file and not a license.txt inspect content type?

[1] http://apache.org/legal/resolved.html#category-a

Michael

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@maven.apache.org
For additional commands, e-mail: dev-h...@maven.apache.org

Reply via email to