Re: [VOTE] POM Element for Source File Encoding

Hervé BOUTEMY Wed, 09 Apr 2008 14:21:16 -0700

Le mercredi 09 avril 2008, Jason van Zyl a écrit :
> All sounds fine. Just wanted you to think about the bigger picture in
> mind.
>
> Please do the work on a branch and go through the rigor of Brian's
> example and make sure it works before you merge it into something we
> would release to users. This is not an insignificant change.
I created http://svn.apache.org/viewvc/maven/sandbox/branches/MNG-2216/ 
with javadoc and jxr plugins branches to test the change, and sample use 
case.


Isn't it sufficient?

Hervé

>
> On 9-Apr-08, at 10:36 AM, Benjamin Bentmann wrote:
> >> Make sure you consider the case where you have people developing
> >> the  same code base all over the world, and the possible reasoning
> >> of  falling back to platform default encoding. Consider the team
> >> spread  across the US, Russia, and China and what do they do
> >> normally?
> >
> > This international spread of developers is in particular the case we
> > have in mind. I mean, how should such a team (say the Maven
> > community) deliver reliable build output if not all developers have
> > agreed to use the same file encoding for the sources? Say the US
> > devs would have ASCII as default encoding, the Europeans Latin-1 and
> > the Asians Big5 for our nice potpourri. Even if all have agreed to
> > use English for coding, you still might encounter Non-ASCII
> > characters that get messed up, e.g. in javadoc comments that carry
> > the name of the contributor/committer. Other developers might
> > experience build failures because of encoding mismatch, at best
> > other people's names are disfigured which is rather impolite.
> >
> > The Eclipse folks had a similar problem [0]. The solution: Lock the
> > encoding down for the entire project.
> >
> >> Is it possible to specify an encoding in one place that doesn't
> >> work somewhere else?
> >
> > Yes, in theory you can have one user specify an encoding that
> > another user's JVM does not support. As the class javadoc about
> > Charset [1] states, only a few encodings - including Latin-1 and
> > UTF-8 - are required to be supported, although the reference
> > implementation from Sun supports quite more encodings [2]. However,
> > I don't consider this as a practical concern. Given that support for
> > UTF-8 is mandatory, there exists an encoding that can handle quite
> > any character people would like to enter and Java can handle. Hence
> > there exists a solution that works for everyone on the team.
> >
> >> I am fortunate in that I've never seen an encoding problem in Maven
> >> personally. In your proposal you talk about aligning the encoding
> >> value but my question in what cases have you found the default
> >> encoding not working as you don't talk about that at all in the
> >> proposal.
> >
> > Well, choose your favorite from a search for "encoding" on all Maven
> > 2 projects in JIRA ;-)
> > - http://jira.codehaus.org/browse/MNG-2932
> > - http://jira.codehaus.org/browse/MANTTASKS-14
> > - http://jira.codehaus.org/browse/MTAGLIST-27
> > - http://jira.codehaus.org/browse/MRELEASE-302
> > - http://jira.codehaus.org/browse/DOXIA-103
> > - http://jira.codehaus.org/browse/MCHANGES-71
> > - (about 300 more hits)
> >
> > ASCII is quite safe, but anything which requires more than those 7
> > bits just needs special care.
> >
> >> Do you know what happens with all the tools that people use. Like
> >> checking into all SCMs, and what happens when people checkout on
> >> to  their system, editors, IDEs. I'm merely suggesting that their
> >> might be  a reason most things fall back to the default encoding on
> >> the system  because it's generally been a hard thing to coral.
> >
> > In principle you're right, most of the tools are intended for usage
> > with the platform's encoding. This seems to include the popular diff/
> > patch tools used by many SCMs, they have not really support for
> > different encodings [3] (yet another historic design flaw, next to
> > the two-digit year format).
> >
> > Also, the SCMs themselves seem not to care about (file content)
> > encoding yet, I have found proposals for Subversion [5] and Bazaar
> > [4] but that's it. However, as far as I can tell, not knowing about
> > file encoding SCMs also do not perform any conversions on the file
> > content but simply assume a simple byte-to-char mapping like ASCII
> > when doing EOL normalization or keyword substitution.
> >
> > As for editors and IDEs: Even this tiny thing "Notepad" from Windows
> > supports UTF-8 nowadays and I wouldn't call that an editor. Does
> > anybody know about a popular editor/IDE that calls itself mature but
> > does not allow to configure file encoding?
> >
> >
> > Benjamin
> >
> >
> > [0] https://bugs.eclipse.org/bugs/show_bug.cgi?id=132898
> > [1] http://java.sun.com/javase/6/docs/api/java/nio/charset/
> > Charset.html
> > [2] http://java.sun.com/j2se/1.4.2/docs/guide/intl/encoding.doc.html
> > [3]
> > http://www.gnu.org/software/diffutils/manual/html_mono/diff.html#Internat
> >ionalization [4]
> > http://bazaar-vcs.org/UnicodeSupport?action=show&redirect=EncodingSupport
> >#head-43c0111da063796da433179faaf8d065bda5c42e [5]
> > http://svn.haxx.se/dev/archive-2006-03/1182.shtml
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
>
> Thanks,
>
> Jason
>
> ----------------------------------------------------------
> Jason van Zyl
> Founder,  Apache Maven
> jason at sonatype dot com
> ----------------------------------------------------------
>
> the course of true love never did run smooth ...
>
> -- Shakespeare
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [VOTE] POM Element for Source File Encoding

Reply via email to