since it is the encoding of a downloaded license, it has nothing to do with encoding of project sources: using ${project.build.sourceEncoding} is IMHO wrong algorithm (which happen to give good results since a lot of people use UTF-8)
then I'd go either for a parameter for the goal, or JSoup that does the magic to detect effective content encoding Regards, Hervé Le vendredi 14 novembre 2014 10:37:22 Michael Osipov a écrit : > Am 2014-11-14 um 04:02 schrieb Kristian Rosenvold: > > Isn't this handled by the content-type headers normally ? > > No, for two reasons: > > 1. The currect code does not inspect the content type > 2. The server does send text/html but not the used encoding which is not > necessary because it is located within the file itself > > The only option would be inspect the content type header and make > further assumptions. > > Michael > > > 2014-11-13 23:15 GMT+01:00 Michael Osipov <micha...@apache.org>: > >> Hi folks, > >> > >> I'd like to know if we have a general concensus on this: > >> > >> I am investigating MPIR-242 and figured out the cause. The input stream > >> is > >> obtained from the HTTP URL and no encoding is given, so ISO-8859-1 is > >> provided as default (yuck!). While I know that some reporting related > >> modules have default values for input/output encoding, this contradicts > >> our > >> general approach to use platform encoding when > >> project.build.sourceEncoding > >> is not given. > >> > >> In that special case, the behavior would be consistent if changed. > >> Setting > >> project.build.sourceEncoding to UTF-8 would solve the problem but is just > >> a > >> workaround. HTML resources carry their encoding with them but the > >> ProjectInfoReportUtils treats everything as input streams (not helpful > >> with > >> XML/HTML). I would really like to avoid peeking with a pushback input > >> stream. > >> > >> How is your opinion on this? > >> > >> I have two solutions in mind for the issue above: > >> > >> 1. Easy: remove ISO-8859-1, assume platform encoding if > >> project.build.sourceEncoding is not provided. > >> 2. Complex: use an HTML parser (JSoup is awesome and license-compatible > >> [1]) to get correctly encoded content. > >> But how do you know that this URL really points to an HTML file and not a > >> license.txt inspect content type? > >> > >> [1] http://apache.org/legal/resolved.html#category-a > >> > >> Michael > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: dev-unsubscr...@maven.apache.org > >> For additional commands, e-mail: dev-h...@maven.apache.org > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@maven.apache.org > > For additional commands, e-mail: dev-h...@maven.apache.org > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@maven.apache.org > For additional commands, e-mail: dev-h...@maven.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@maven.apache.org For additional commands, e-mail: dev-h...@maven.apache.org