Benjamin Bentmann wrote:
In general, I completely agree with your preference to Unicode and fail-fast
behavior. If I had been involved when the Maven story started, I would have
proposed UTF-8 as the default value, no doubt.

As for today, I tried to consider consistency with existing behavior. The
Maven Site Plugin was already using Latin-1 as the default value for
inputEncoding and outputEncoding and so I proposed this for other plugins,
too. Indeed, one of the patches (MJAVADOC-165) was just released such that
already two plugins teach users this default value. Therefore I fear it
might be too late to introduce another default value. If the community
believes this change is worth the confusion caused on users, I'm the first
one running the other way round ;-)

I see your point. Worth another vote? Or should this switch be postponed to 2.1, trading consistency in minor version upgrades for a longer time for these Latin1 defaults to be established?

Given the failfast nature of the UTF-8 default, we won't have to worry about the switch going unnoticed. Developers switching from a version defaulting to Latin1 to UTF-8 will notice the change immediately, and for development in a heterogenous environment they can simply override the super-POM with their own default.

So while I agree that a change in default either now or in the future is ugly, it is not taboo, and I believe woth the gain.

That's a good point. It appears we need to do some extra homework here: The
simplisitic use of InputStreamReader and OutputStreamReader will silently
convert unmappable byte sequences to a default character ('?', see also
[0]). I guess we could nicely hide the required implementation by means of
the existing methods in Reader-/WriterFactory from plexus-utils.

That works for plugins doing the conversion in code under our control. Other plugins that use external libraries or tools might be more difficult.

Note that ASCII-only sources will compile cleanly no matter the default
encoding

Most of time, but UTF-16 or EBCDIC have not even ASCII in common.

I was thinking about the default of the default, i.e. the value to be set in the super-POM. We certainly won't choose UTF-16 or EBCDIC for this global default, and as files encoded in UTF-16 or EBCDIC don't count as ASCII-only, my

 Martin

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to