Isn't this handled by the content-type headers normally ? Kristian
2014-11-13 23:15 GMT+01:00 Michael Osipov <micha...@apache.org>: > Hi folks, > > I'd like to know if we have a general concensus on this: > > I am investigating MPIR-242 and figured out the cause. The input stream is > obtained from the HTTP URL and no encoding is given, so ISO-8859-1 is > provided as default (yuck!). While I know that some reporting related > modules have default values for input/output encoding, this contradicts our > general approach to use platform encoding when project.build.sourceEncoding > is not given. > > In that special case, the behavior would be consistent if changed. Setting > project.build.sourceEncoding to UTF-8 would solve the problem but is just a > workaround. HTML resources carry their encoding with them but the > ProjectInfoReportUtils treats everything as input streams (not helpful with > XML/HTML). I would really like to avoid peeking with a pushback input > stream. > > How is your opinion on this? > > I have two solutions in mind for the issue above: > > 1. Easy: remove ISO-8859-1, assume platform encoding if > project.build.sourceEncoding is not provided. > 2. Complex: use an HTML parser (JSoup is awesome and license-compatible [1]) > to get correctly encoded content. > But how do you know that this URL really points to an HTML file and not a > license.txt inspect content type? > > [1] http://apache.org/legal/resolved.html#category-a > > Michael > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@maven.apache.org > For additional commands, e-mail: dev-h...@maven.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@maven.apache.org For additional commands, e-mail: dev-h...@maven.apache.org