since it is the encoding of a downloaded license, it has nothing to do with
encoding of project sources: using ${project.build.sourceEncoding} is IMHO
wrong algorithm (which happen to give good results since a lot of people use
UTF-8)
then I'd go either for a parameter for the goal, or JSoup that does the magic
to detect effective content encoding
Regards,
Hervé
Le vendredi 14 novembre 2014 10:37:22 Michael Osipov a écrit :
> Am 2014-11-14 um 04:02 schrieb Kristian Rosenvold:
> > Isn't this handled by the content-type headers normally ?
>
> No, for two reasons:
>
> 1. The currect code does not inspect the content type
> 2. The server does send text/html but not the used encoding which is not
> necessary because it is located within the file itself
>
> The only option would be inspect the content type header and make
> further assumptions.
>
> Michael
>
> > 2014-11-13 23:15 GMT+01:00 Michael Osipov <[email protected]>:
> >> Hi folks,
> >>
> >> I'd like to know if we have a general concensus on this:
> >>
> >> I am investigating MPIR-242 and figured out the cause. The input stream
> >> is
> >> obtained from the HTTP URL and no encoding is given, so ISO-8859-1 is
> >> provided as default (yuck!). While I know that some reporting related
> >> modules have default values for input/output encoding, this contradicts
> >> our
> >> general approach to use platform encoding when
> >> project.build.sourceEncoding
> >> is not given.
> >>
> >> In that special case, the behavior would be consistent if changed.
> >> Setting
> >> project.build.sourceEncoding to UTF-8 would solve the problem but is just
> >> a
> >> workaround. HTML resources carry their encoding with them but the
> >> ProjectInfoReportUtils treats everything as input streams (not helpful
> >> with
> >> XML/HTML). I would really like to avoid peeking with a pushback input
> >> stream.
> >>
> >> How is your opinion on this?
> >>
> >> I have two solutions in mind for the issue above:
> >>
> >> 1. Easy: remove ISO-8859-1, assume platform encoding if
> >> project.build.sourceEncoding is not provided.
> >> 2. Complex: use an HTML parser (JSoup is awesome and license-compatible
> >> [1]) to get correctly encoded content.
> >> But how do you know that this URL really points to an HTML file and not a
> >> license.txt inspect content type?
> >>
> >> [1] http://apache.org/legal/resolved.html#category-a
> >>
> >> Michael
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [email protected]
> >> For additional commands, e-mail: [email protected]
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [email protected]
> > For additional commands, e-mail: [email protected]
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]