All sounds fine. Just wanted you to think about the bigger picture in
mind.
Please do the work on a branch and go through the rigor of Brian's
example and make sure it works before you merge it into something we
would release to users. This is not an insignificant change.
On 9-Apr-08, at 10:36 AM, Benjamin Bentmann wrote:
Make sure you consider the case where you have people developing
the same code base all over the world, and the possible reasoning
of falling back to platform default encoding. Consider the team
spread across the US, Russia, and China and what do they do
normally?
This international spread of developers is in particular the case we
have in mind. I mean, how should such a team (say the Maven
community) deliver reliable build output if not all developers have
agreed to use the same file encoding for the sources? Say the US
devs would have ASCII as default encoding, the Europeans Latin-1 and
the Asians Big5 for our nice potpourri. Even if all have agreed to
use English for coding, you still might encounter Non-ASCII
characters that get messed up, e.g. in javadoc comments that carry
the name of the contributor/committer. Other developers might
experience build failures because of encoding mismatch, at best
other people's names are disfigured which is rather impolite.
The Eclipse folks had a similar problem [0]. The solution: Lock the
encoding down for the entire project.
Is it possible to specify an encoding in one place that doesn't
work somewhere else?
Yes, in theory you can have one user specify an encoding that
another user's JVM does not support. As the class javadoc about
Charset [1] states, only a few encodings - including Latin-1 and
UTF-8 - are required to be supported, although the reference
implementation from Sun supports quite more encodings [2]. However,
I don't consider this as a practical concern. Given that support for
UTF-8 is mandatory, there exists an encoding that can handle quite
any character people would like to enter and Java can handle. Hence
there exists a solution that works for everyone on the team.
I am fortunate in that I've never seen an encoding problem in Maven
personally. In your proposal you talk about aligning the encoding
value but my question in what cases have you found the default
encoding not working as you don't talk about that at all in the
proposal.
Well, choose your favorite from a search for "encoding" on all Maven
2 projects in JIRA ;-)
- http://jira.codehaus.org/browse/MNG-2932
- http://jira.codehaus.org/browse/MANTTASKS-14
- http://jira.codehaus.org/browse/MTAGLIST-27
- http://jira.codehaus.org/browse/MRELEASE-302
- http://jira.codehaus.org/browse/DOXIA-103
- http://jira.codehaus.org/browse/MCHANGES-71
- (about 300 more hits)
ASCII is quite safe, but anything which requires more than those 7
bits just needs special care.
Do you know what happens with all the tools that people use. Like
checking into all SCMs, and what happens when people checkout on
to their system, editors, IDEs. I'm merely suggesting that their
might be a reason most things fall back to the default encoding on
the system because it's generally been a hard thing to coral.
In principle you're right, most of the tools are intended for usage
with the platform's encoding. This seems to include the popular diff/
patch tools used by many SCMs, they have not really support for
different encodings [3] (yet another historic design flaw, next to
the two-digit year format).
Also, the SCMs themselves seem not to care about (file content)
encoding yet, I have found proposals for Subversion [5] and Bazaar
[4] but that's it. However, as far as I can tell, not knowing about
file encoding SCMs also do not perform any conversions on the file
content but simply assume a simple byte-to-char mapping like ASCII
when doing EOL normalization or keyword substitution.
As for editors and IDEs: Even this tiny thing "Notepad" from Windows
supports UTF-8 nowadays and I wouldn't call that an editor. Does
anybody know about a popular editor/IDE that calls itself mature but
does not allow to configure file encoding?
Benjamin
[0] https://bugs.eclipse.org/bugs/show_bug.cgi?id=132898
[1] http://java.sun.com/javase/6/docs/api/java/nio/charset/
Charset.html
[2] http://java.sun.com/j2se/1.4.2/docs/guide/intl/encoding.doc.html
[3]
http://www.gnu.org/software/diffutils/manual/html_mono/diff.html#Internationalization
[4]
http://bazaar-vcs.org/UnicodeSupport?action=show&redirect=EncodingSupport#head-43c0111da063796da433179faaf8d065bda5c42e
[5] http://svn.haxx.se/dev/archive-2006-03/1182.shtml
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Thanks,
Jason
----------------------------------------------------------
Jason van Zyl
Founder, Apache Maven
jason at sonatype dot com
----------------------------------------------------------
the course of true love never did run smooth ...
-- Shakespeare
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]