Am 2014-11-14 um 17:47 schrieb Hervé BOUTEMY:
since it is the encoding of a downloaded license, it has nothing to do with
encoding of project sources: using ${project.build.sourceEncoding} is IMHO
wrong algorithm (which happen to give good results since a lot of people use
UTF-8)

then I'd go either for a parameter for the goal, or JSoup that does the magic
to detect effective content encoding

While this seems sound what about if the ressource is plain text and no encoding can be deduced?

The parameter won't help if there are several licenses with several encodings used.

Le vendredi 14 novembre 2014 10:37:22 Michael Osipov a écrit :
Am 2014-11-14 um 04:02 schrieb Kristian Rosenvold:
Isn't this handled by the content-type headers normally ?

No, for two reasons:

1. The currect code does not inspect the content type
2. The server does send text/html but not the used encoding which is not
necessary because it is located within the file itself

The only option would be inspect the content type header and make
further assumptions.

Michael

2014-11-13 23:15 GMT+01:00 Michael Osipov <[email protected]>:
Hi folks,

I'd like to know if we have a general concensus on this:

I am investigating MPIR-242 and figured out the cause. The input stream
is
obtained from the HTTP URL and no encoding is given, so ISO-8859-1 is
provided as default (yuck!). While I know that some reporting related
modules have default values for input/output encoding, this contradicts
our
general approach to use platform encoding when
project.build.sourceEncoding
is not given.

In that special case, the behavior would be consistent if changed.
Setting
project.build.sourceEncoding to UTF-8 would solve the problem but is just
a
workaround. HTML resources carry their encoding with them but the
ProjectInfoReportUtils treats everything as input streams (not helpful
with
XML/HTML). I would really like to avoid peeking with a pushback input
stream.

How is your opinion on this?

I have two solutions in mind for the issue above:

1. Easy: remove ISO-8859-1, assume platform encoding if
project.build.sourceEncoding is not provided.
2. Complex: use an HTML parser (JSoup is awesome and license-compatible
[1]) to get correctly encoded content.
But how do you know that this URL really points to an HTML file and not a
license.txt inspect content type?

[1] http://apache.org/legal/resolved.html#category-a

Michael

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]





---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to