Taking this together, one might argue to have UTF-8 the default, not
ISO-8859-1.

In general, I completely agree with your preference to Unicode and fail-fast
behavior. If I had been involved when the Maven story started, I would have
proposed UTF-8 as the default value, no doubt.

As for today, I tried to consider consistency with existing behavior. The
Maven Site Plugin was already using Latin-1 as the default value for
inputEncoding and outputEncoding and so I proposed this for other plugins,
too. Indeed, one of the patches (MJAVADOC-165) was just released such that
already two plugins teach users this default value. Therefore I fear it
might be too late to introduce another default value. If the community
believes this change is worth the confusion caused on users, I'm the first
one running the other way round ;-)

It should be checked whether plugins really die for invalid UTF-8
sequences, and what the output looks like.

That's a good point. It appears we need to do some extra homework here: The
simplisitic use of InputStreamReader and OutputStreamReader will silently
convert unmappable byte sequences to a default character ('?', see also
[0]). I guess we could nicely hide the required implementation by means of
the existing methods in Reader-/WriterFactory from plexus-utils.

Note that ASCII-only sources will compile cleanly no matter the default
encoding

Most of time, but UTF-16 or EBCDIC have not even ASCII in common.


Benjamin


[0] http://java.sun.com/javase/6/docs/api/java/io/OutputStreamWriter.html


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to