since it is the encoding of a downloaded license, it has nothing to do with 
encoding of project sources: using ${project.build.sourceEncoding} is IMHO 
wrong algorithm (which happen to give good results since a lot of people use 
UTF-8)

then I'd go either for a parameter for the goal, or JSoup that does the magic 
to detect effective content encoding

Regards,

Hervé

Le vendredi 14 novembre 2014 10:37:22 Michael Osipov a écrit :
> Am 2014-11-14 um 04:02 schrieb Kristian Rosenvold:
> > Isn't this handled by the content-type headers normally ?
> 
> No, for two reasons:
> 
> 1. The currect code does not inspect the content type
> 2. The server does send text/html but not the used encoding which is not
> necessary because it is located within the file itself
> 
> The only option would be inspect the content type header and make
> further assumptions.
> 
> Michael
> 
> > 2014-11-13 23:15 GMT+01:00 Michael Osipov <micha...@apache.org>:
> >> Hi folks,
> >> 
> >> I'd like to know if we have a general concensus on this:
> >> 
> >> I am investigating MPIR-242 and figured out the cause. The input stream
> >> is
> >> obtained from the HTTP URL and no encoding is given, so ISO-8859-1 is
> >> provided as default (yuck!). While I know that some reporting related
> >> modules have default values for input/output encoding, this contradicts
> >> our
> >> general approach to use platform encoding when
> >> project.build.sourceEncoding
> >> is not given.
> >> 
> >> In that special case, the behavior would be consistent if changed.
> >> Setting
> >> project.build.sourceEncoding to UTF-8 would solve the problem but is just
> >> a
> >> workaround. HTML resources carry their encoding with them but the
> >> ProjectInfoReportUtils treats everything as input streams (not helpful
> >> with
> >> XML/HTML). I would really like to avoid peeking with a pushback input
> >> stream.
> >> 
> >> How is your opinion on this?
> >> 
> >> I have two solutions in mind for the issue above:
> >> 
> >> 1. Easy: remove ISO-8859-1, assume platform encoding if
> >> project.build.sourceEncoding is not provided.
> >> 2. Complex: use an HTML parser (JSoup is awesome and license-compatible
> >> [1]) to get correctly encoded content.
> >> But how do you know that this URL really points to an HTML file and not a
> >> license.txt inspect content type?
> >> 
> >> [1] http://apache.org/legal/resolved.html#category-a
> >> 
> >> Michael
> >> 
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscr...@maven.apache.org
> >> For additional commands, e-mail: dev-h...@maven.apache.org
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@maven.apache.org
> > For additional commands, e-mail: dev-h...@maven.apache.org
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@maven.apache.org
> For additional commands, e-mail: dev-h...@maven.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@maven.apache.org
For additional commands, e-mail: dev-h...@maven.apache.org

Reply via email to