Benjamin Bentmann wrote:
In general, I completely agree with your preference to Unicode and fail-fastbehavior. If I had been involved when the Maven story started, I would have proposed UTF-8 as the default value, no doubt.As for today, I tried to consider consistency with existing behavior. The Maven Site Plugin was already using Latin-1 as the default value for inputEncoding and outputEncoding and so I proposed this for other plugins, too. Indeed, one of the patches (MJAVADOC-165) was just released such that already two plugins teach users this default value. Therefore I fear it might be too late to introduce another default value. If the community believes this change is worth the confusion caused on users, I'm the first one running the other way round ;-)
I see your point. Worth another vote? Or should this switch be postponed to 2.1, trading consistency in minor version upgrades for a longer time for these Latin1 defaults to be established?
Given the failfast nature of the UTF-8 default, we won't have to worry about the switch going unnoticed. Developers switching from a version defaulting to Latin1 to UTF-8 will notice the change immediately, and for development in a heterogenous environment they can simply override the super-POM with their own default.
So while I agree that a change in default either now or in the future is ugly, it is not taboo, and I believe woth the gain.
That's a good point. It appears we need to do some extra homework here: The simplisitic use of InputStreamReader and OutputStreamReader will silently convert unmappable byte sequences to a default character ('?', see also [0]). I guess we could nicely hide the required implementation by means of the existing methods in Reader-/WriterFactory from plexus-utils.
That works for plugins doing the conversion in code under our control. Other plugins that use external libraries or tools might be more difficult.
Note that ASCII-only sources will compile cleanly no matter the default encodingMost of time, but UTF-16 or EBCDIC have not even ASCII in common.
I was thinking about the default of the default, i.e. the value to be set in the super-POM. We certainly won't choose UTF-16 or EBCDIC for this global default, and as files encoded in UTF-16 or EBCDIC don't count as ASCII-only, my
Martin
signature.asc
Description: OpenPGP digital signature