All sounds fine. Just wanted you to think about the bigger picture in mind.

Please do the work on a branch and go through the rigor of Brian's example and make sure it works before you merge it into something we would release to users. This is not an insignificant change.

On 9-Apr-08, at 10:36 AM, Benjamin Bentmann wrote:
Make sure you consider the case where you have people developing the same code base all over the world, and the possible reasoning of falling back to platform default encoding. Consider the team spread across the US, Russia, and China and what do they do normally?

This international spread of developers is in particular the case we have in mind. I mean, how should such a team (say the Maven community) deliver reliable build output if not all developers have agreed to use the same file encoding for the sources? Say the US devs would have ASCII as default encoding, the Europeans Latin-1 and the Asians Big5 for our nice potpourri. Even if all have agreed to use English for coding, you still might encounter Non-ASCII characters that get messed up, e.g. in javadoc comments that carry the name of the contributor/committer. Other developers might experience build failures because of encoding mismatch, at best other people's names are disfigured which is rather impolite.

The Eclipse folks had a similar problem [0]. The solution: Lock the encoding down for the entire project.

Is it possible to specify an encoding in one place that doesn't work somewhere else?

Yes, in theory you can have one user specify an encoding that another user's JVM does not support. As the class javadoc about Charset [1] states, only a few encodings - including Latin-1 and UTF-8 - are required to be supported, although the reference implementation from Sun supports quite more encodings [2]. However, I don't consider this as a practical concern. Given that support for UTF-8 is mandatory, there exists an encoding that can handle quite any character people would like to enter and Java can handle. Hence there exists a solution that works for everyone on the team.

I am fortunate in that I've never seen an encoding problem in Maven personally. In your proposal you talk about aligning the encoding value but my question in what cases have you found the default encoding not working as you don't talk about that at all in the proposal.

Well, choose your favorite from a search for "encoding" on all Maven 2 projects in JIRA ;-)
- http://jira.codehaus.org/browse/MNG-2932
- http://jira.codehaus.org/browse/MANTTASKS-14
- http://jira.codehaus.org/browse/MTAGLIST-27
- http://jira.codehaus.org/browse/MRELEASE-302
- http://jira.codehaus.org/browse/DOXIA-103
- http://jira.codehaus.org/browse/MCHANGES-71
- (about 300 more hits)

ASCII is quite safe, but anything which requires more than those 7 bits just needs special care.

Do you know what happens with all the tools that people use. Like checking into all SCMs, and what happens when people checkout on to their system, editors, IDEs. I'm merely suggesting that their might be a reason most things fall back to the default encoding on the system because it's generally been a hard thing to coral.

In principle you're right, most of the tools are intended for usage with the platform's encoding. This seems to include the popular diff/ patch tools used by many SCMs, they have not really support for different encodings [3] (yet another historic design flaw, next to the two-digit year format).

Also, the SCMs themselves seem not to care about (file content) encoding yet, I have found proposals for Subversion [5] and Bazaar [4] but that's it. However, as far as I can tell, not knowing about file encoding SCMs also do not perform any conversions on the file content but simply assume a simple byte-to-char mapping like ASCII when doing EOL normalization or keyword substitution.

As for editors and IDEs: Even this tiny thing "Notepad" from Windows supports UTF-8 nowadays and I wouldn't call that an editor. Does anybody know about a popular editor/IDE that calls itself mature but does not allow to configure file encoding?


Benjamin


[0] https://bugs.eclipse.org/bugs/show_bug.cgi?id=132898
[1] http://java.sun.com/javase/6/docs/api/java/nio/charset/ Charset.html
[2] http://java.sun.com/j2se/1.4.2/docs/guide/intl/encoding.doc.html
[3] 
http://www.gnu.org/software/diffutils/manual/html_mono/diff.html#Internationalization
[4] 
http://bazaar-vcs.org/UnicodeSupport?action=show&redirect=EncodingSupport#head-43c0111da063796da433179faaf8d065bda5c42e
[5] http://svn.haxx.se/dev/archive-2006-03/1182.shtml

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Thanks,

Jason

----------------------------------------------------------
Jason van Zyl
Founder,  Apache Maven
jason at sonatype dot com
----------------------------------------------------------

the course of true love never did run smooth ...

-- Shakespeare



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to