Isn't this handled by the content-type headers normally ?

Kristian


2014-11-13 23:15 GMT+01:00 Michael Osipov <micha...@apache.org>:
> Hi folks,
>
> I'd like to know if we have a general concensus on this:
>
> I am investigating MPIR-242 and figured out the cause. The input stream is
> obtained from the HTTP URL and no encoding is given, so ISO-8859-1 is
> provided as default (yuck!). While I know that some reporting related
> modules have default values for input/output encoding, this contradicts our
> general approach to use platform encoding when project.build.sourceEncoding
> is not given.
>
> In that special case, the behavior would be consistent if changed. Setting
> project.build.sourceEncoding to UTF-8 would solve the problem but is just a
> workaround. HTML resources carry their encoding with them but the
> ProjectInfoReportUtils treats everything as input streams (not helpful with
> XML/HTML). I would really like to avoid peeking with a pushback input
> stream.
>
> How is your opinion on this?
>
> I have two solutions in mind for the issue above:
>
> 1. Easy: remove ISO-8859-1, assume platform encoding if
> project.build.sourceEncoding is not provided.
> 2. Complex: use an HTML parser (JSoup is awesome and license-compatible [1])
> to get correctly encoded content.
> But how do you know that this URL really points to an HTML file and not a
> license.txt inspect content type?
>
> [1] http://apache.org/legal/resolved.html#category-a
>
> Michael
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@maven.apache.org
> For additional commands, e-mail: dev-h...@maven.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@maven.apache.org
For additional commands, e-mail: dev-h...@maven.apache.org

Reply via email to