Le vendredi 14 novembre 2014 17:58:44 Michael Osipov a écrit : > Am 2014-11-14 um 17:47 schrieb Hervé BOUTEMY: > > since it is the encoding of a downloaded license, it has nothing to do > > with > > encoding of project sources: using ${project.build.sourceEncoding} is IMHO > > wrong algorithm (which happen to give good results since a lot of people > > use UTF-8) > > > > then I'd go either for a parameter for the goal, or JSoup that does the > > magic to detect effective content encoding > > While this seems sound what about if the ressource is plain text and no > encoding can be deduced? true: our only bet is parameter
> > The parameter won't help if there are several licenses with several > encodings used. looks like the parameter can be either simple or complex: need a syntax or just ignore: is it theory or reality? > > > Le vendredi 14 novembre 2014 10:37:22 Michael Osipov a écrit : > >> Am 2014-11-14 um 04:02 schrieb Kristian Rosenvold: > >>> Isn't this handled by the content-type headers normally ? > >> > >> No, for two reasons: > >> > >> 1. The currect code does not inspect the content type > >> 2. The server does send text/html but not the used encoding which is not > >> necessary because it is located within the file itself > >> > >> The only option would be inspect the content type header and make > >> further assumptions. > >> > >> Michael > >> > >>> 2014-11-13 23:15 GMT+01:00 Michael Osipov <micha...@apache.org>: > >>>> Hi folks, > >>>> > >>>> I'd like to know if we have a general concensus on this: > >>>> > >>>> I am investigating MPIR-242 and figured out the cause. The input stream > >>>> is > >>>> obtained from the HTTP URL and no encoding is given, so ISO-8859-1 is > >>>> provided as default (yuck!). While I know that some reporting related > >>>> modules have default values for input/output encoding, this contradicts > >>>> our > >>>> general approach to use platform encoding when > >>>> project.build.sourceEncoding > >>>> is not given. > >>>> > >>>> In that special case, the behavior would be consistent if changed. > >>>> Setting > >>>> project.build.sourceEncoding to UTF-8 would solve the problem but is > >>>> just > >>>> a > >>>> workaround. HTML resources carry their encoding with them but the > >>>> ProjectInfoReportUtils treats everything as input streams (not helpful > >>>> with > >>>> XML/HTML). I would really like to avoid peeking with a pushback input > >>>> stream. > >>>> > >>>> How is your opinion on this? > >>>> > >>>> I have two solutions in mind for the issue above: > >>>> > >>>> 1. Easy: remove ISO-8859-1, assume platform encoding if > >>>> project.build.sourceEncoding is not provided. > >>>> 2. Complex: use an HTML parser (JSoup is awesome and license-compatible > >>>> [1]) to get correctly encoded content. > >>>> But how do you know that this URL really points to an HTML file and not > >>>> a > >>>> license.txt inspect content type? > >>>> > >>>> [1] http://apache.org/legal/resolved.html#category-a > >>>> > >>>> Michael > >>>> > >>>> --------------------------------------------------------------------- > >>>> To unsubscribe, e-mail: dev-unsubscr...@maven.apache.org > >>>> For additional commands, e-mail: dev-h...@maven.apache.org > >>> > >>> --------------------------------------------------------------------- > >>> To unsubscribe, e-mail: dev-unsubscr...@maven.apache.org > >>> For additional commands, e-mail: dev-h...@maven.apache.org > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: dev-unsubscr...@maven.apache.org > >> For additional commands, e-mail: dev-h...@maven.apache.org > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@maven.apache.org > > For additional commands, e-mail: dev-h...@maven.apache.org > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@maven.apache.org > For additional commands, e-mail: dev-h...@maven.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@maven.apache.org For additional commands, e-mail: dev-h...@maven.apache.org