Re: [VOTE] POM Element for Source File Encoding

2008-04-12 Thread Hervé BOUTEMY
 I created http://svn.apache.org/viewvc/maven/sandbox/branches/MNG-2216/
 with javadoc and jxr plugins branches to test the change, and sample use
 case.
no reaction: I suppose this is lazy consensus :)

I'll start to merge to plugins trunks tomorrow

regards

Hervé

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: [VOTE] POM Element for Source File Encoding

2008-04-12 Thread Brian E. Fox
Al the work is being put on a branch right? That was where I saw the discussion 
with Jason going.

-Original Message-
From: Hervé BOUTEMY [mailto:[EMAIL PROTECTED] 
Sent: Saturday, April 12, 2008 10:06 AM
To: Maven Developers List
Subject: Re: [VOTE] POM Element for Source File Encoding

 I created http://svn.apache.org/viewvc/maven/sandbox/branches/MNG-2216/
 with javadoc and jxr plugins branches to test the change, and sample use
 case.
no reaction: I suppose this is lazy consensus :)

I'll start to merge to plugins trunks tomorrow

regards

Hervé

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [VOTE] POM Element for Source File Encoding

2008-04-12 Thread Hervé BOUTEMY
Le samedi 12 avril 2008, Brian E. Fox a écrit :
 Al the work is being put on a branch right? That was where I saw the
 discussion with Jason going.
I did the work on 2 plugins in a branch:
- jxr: http://svn.apache.org/viewvc?rev=645260view=rev
- javadoc: http://svn.apache.org/viewvc?rev=645262view=rev
As you can see, the change on plugins themselves is really tiny: it's much 
about convention, little about code.

Sample use is in the branch too, to let Maven developers see the concrete 
positive impact on users:

- actually every plugin has to be configured separately (pom is bigger), each 
one having its own parameter name for encoding (confusion): 
http://svn.apache.org/viewvc/maven/sandbox/branches/MNG-2216/before/pom.xml?view=markup

- after the plugin change, there is one property that every plugin uses as a 
default value, hiding the fact that the parameter name is different for each 
plugin:
http://svn.apache.org/viewvc/maven/sandbox/branches/MNG-2216/after/pom.xml?view=markup

There is still exactly the same work to be done on at least 7 other Apache 
plugins and 4 Codehaus ones. The change on some plugins will represent more 
code, since they don't even support an encoding parameter yet, but the 
proposal on which we need to agree is about the convention to unify the 
parameter's value. IMHO the actual work on 2 plugins shows everything.

I think it is sufficient to adopt, or reject, or transform, any aspect of the 
proposal:
http://docs.codehaus.org/display/MAVENUSER/POM+Element+for+Source+File+Encoding

any objection?

Hervé



 -Original Message-
 From: Hervé BOUTEMY [mailto:[EMAIL PROTECTED]
 Sent: Saturday, April 12, 2008 10:06 AM
 To: Maven Developers List
 Subject: Re: [VOTE] POM Element for Source File Encoding

  I created http://svn.apache.org/viewvc/maven/sandbox/branches/MNG-2216/
  with javadoc and jxr plugins branches to test the change, and sample use
  case.

 no reaction: I suppose this is lazy consensus :)

 I'll start to merge to plugins trunks tomorrow

 regards

 Hervé

 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]


 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [VOTE] POM Element for Source File Encoding

2008-04-09 Thread Martin von Gagern

Benjamin Bentmann wrote:
You could of course write an encoding detection plugin which could 
examine the code and set the required property accordingly.


Personally, I don't see the use case for that. If there are really users 
out there that don't know what file encoding they are using when writing up

their sources, they are most probably happy with the proposed default value
of Latin-1. Alternatively, this encoding detection plugin could be as 
simple as printing out the Java system property ${file.encoding} which obviously

worked well enough for the user.


${file.encoding} will only work if the file originated on the same machine.

I think of semi-automatic conversions of inhomogenous code into maven. 
E.g. some teacher collects homework from his students as a bunch of zip 
files containing only source, has a script to turn each into a maven 
project, and a master project interacting with them, like letting them 
compete with one another or whatever. In this case one might wish to 
automatically detect the encoding of every module, especially in locales 
with several commonly used encodings, so that string literals in these 
classes are handled correctly without the students even knowing what an 
encoding is.


But that's a corner case, so I guess we should stop discussion about the 
use of such a program here, until someone actually requires it.


Greetings,
 Martin



signature.asc
Description: OpenPGP digital signature


Re: [VOTE] POM Element for Source File Encoding

2008-04-09 Thread Martin von Gagern

Paul Benedict wrote:

Just a proposal: Maven could loosen its parsing rules when it detects
versions greater than it is configured to accept.

Forward compatibility would be nice.


For anyone seriously interested in interoperability , I suggest a look 
at http://www.w3.org/2005/05/xsd-versioning-resources.html , especially 
the use cases, which illustrate several issues quite well.


 Martin



signature.asc
Description: OpenPGP digital signature


Re: [VOTE] POM Element for Source File Encoding

2008-04-09 Thread Martin von Gagern

Benjamin Bentmann wrote:
With regard to user errors, my general 
suggestion is to fail the build. This unforgiving attitude should not be 
that unfamilar to users: It has been chosen for a popular format like 
XML which is also employed by Maven for a few files.


The problems depend on the encodings: If one feeds Latin-1 into an UTF-8 
decoder, you most likely encouter invalid byte sequences, making the 
decoder fail. That's my favorite case as it clearly shows the user 
something is wrong and needs his attention. The other case is worse 
because more subtle: Feeding UTF-8 into a Latin-1 decoder will pass but 
produces output that only a human can tell being garbage by closing 
analyzing the few Non-ASCII characters.


Taking this together, one might argue to have UTF-8 the default, not 
ISO-8859-1.


Almost anything that passes UTF-8 encoding constraints will be indeed 
UTF-8, as non-ASCII files that are not UTF-8 will almost certainly 
contain sequences not valid in UTF-8. So if a user fails to specify the 
encoding he uses, and if this encoding isn't UTF-8, then things will 
break for him. This has two advantages:


1. fail-fast behaviour. If there is a misconfiguration, the maven run 
will die, and the developer can fix the issue. You don't have to wait 
for some other developer complaining about garbled strings or a user 
complaining about a broken website until you can fix things.


2. promote unicode. While there are a lot of encosings out there for 
historic reasons, most of them suffer severe drawbacks in an 
international software project, because they either can't express all 
needed characters, or they are not common outside a small region. So 
while Taiwanese developers might be happy to develop an English/Chinese 
project in Big5, prospective american Contributors might not get their 
editor to load files as Big5. UTF-8, on the other hand, is used 
worldwide and provides the whole Unicode range.
For new projects, I guess UTF-8 would be a reasonable best practice, and 
making this best practice the default in maven might promote it.


Of course this conflicts with previous discussions about Latin1 ensuring 
that any project can get compiled, as it has no invalid byte sequences. 
The choice is whether, in the absence of configuration,


A) you want your compile to succeed all the time, possibly generating 
the wrong results, or


B) you want your build to fail in case of a misconfiguration (including 
missing configuration), but ensure correct results if it does not fail.


If I understood him correctly, Jason voted for A). I took his request 
for non-dying builds as a requirement and pointed out that this is 
possible with Latin1. Now that I think about it, I believe I would 
rather want B), as I'm all for failfast deterministic behaviour.


It should be checked whether plugins really die for invalid UTF-8 
sequences, and what the output looks like. If possible, plugins should 
point out that a misconfiguration of the encoding in the pom (either the 
plugin configuration or the proposed global configuration property) is 
possibly the cause of the error, if it's not a developer using another 
encoding.


Note that ASCII-only sources will compile cleanly no matter the default 
encoding, so all projects that don't need to worry about encoding won't 
be forced to do so. Only international projects where encoding is 
relevant will force their developers to either follow best practices or 
explicitely state their policy.


Greetings,
 Martin



signature.asc
Description: OpenPGP digital signature


Re: [VOTE] POM Element for Source File Encoding

2008-04-09 Thread Benjamin Bentmann

Taking this together, one might argue to have UTF-8 the default, not
ISO-8859-1.


In general, I completely agree with your preference to Unicode and fail-fast
behavior. If I had been involved when the Maven story started, I would have
proposed UTF-8 as the default value, no doubt.

As for today, I tried to consider consistency with existing behavior. The
Maven Site Plugin was already using Latin-1 as the default value for
inputEncoding and outputEncoding and so I proposed this for other plugins,
too. Indeed, one of the patches (MJAVADOC-165) was just released such that
already two plugins teach users this default value. Therefore I fear it
might be too late to introduce another default value. If the community
believes this change is worth the confusion caused on users, I'm the first
one running the other way round ;-)


It should be checked whether plugins really die for invalid UTF-8
sequences, and what the output looks like.


That's a good point. It appears we need to do some extra homework here: The
simplisitic use of InputStreamReader and OutputStreamReader will silently
convert unmappable byte sequences to a default character ('?', see also
[0]). I guess we could nicely hide the required implementation by means of
the existing methods in Reader-/WriterFactory from plexus-utils.


Note that ASCII-only sources will compile cleanly no matter the default
encoding


Most of time, but UTF-16 or EBCDIC have not even ASCII in common.


Benjamin


[0] http://java.sun.com/javase/6/docs/api/java/io/OutputStreamWriter.html


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: [VOTE] POM Element for Source File Encoding

2008-04-09 Thread Brian E. Fox

As for today, I tried to consider consistency with existing behavior.
The
Maven Site Plugin was already using Latin-1 as the default value for
inputEncoding and outputEncoding and so I proposed this for other
plugins,
too. Indeed, one of the patches (MJAVADOC-165) was just released such
that
already two plugins teach users this default value. Therefore I fear it
might be too late to introduce another default value. If the community
believes this change is worth the confusion caused on users, I'm the
first
one running the other way round ;-)

Don't break existing builds. No regressions. ;-)



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [VOTE] POM Element for Source File Encoding

2008-04-09 Thread Martin von Gagern

Benjamin Bentmann wrote:
In general, I completely agree with your preference to Unicode and 
fail-fast

behavior. If I had been involved when the Maven story started, I would have
proposed UTF-8 as the default value, no doubt.

As for today, I tried to consider consistency with existing behavior. The
Maven Site Plugin was already using Latin-1 as the default value for
inputEncoding and outputEncoding and so I proposed this for other plugins,
too. Indeed, one of the patches (MJAVADOC-165) was just released such that
already two plugins teach users this default value. Therefore I fear it
might be too late to introduce another default value. If the community
believes this change is worth the confusion caused on users, I'm the first
one running the other way round ;-)


I see your point. Worth another vote? Or should this switch be postponed 
to 2.1, trading consistency in minor version upgrades for a longer time 
for these Latin1 defaults to be established?


Given the failfast nature of the UTF-8 default, we won't have to worry 
about the switch going unnoticed. Developers switching from a version 
defaulting to Latin1 to UTF-8 will notice the change immediately, and 
for development in a heterogenous environment they can simply override 
the super-POM with their own default.


So while I agree that a change in default either now or in the future is 
ugly, it is not taboo, and I believe woth the gain.



That's a good point. It appears we need to do some extra homework here: The
simplisitic use of InputStreamReader and OutputStreamReader will silently
convert unmappable byte sequences to a default character ('?', see also
[0]). I guess we could nicely hide the required implementation by means of
the existing methods in Reader-/WriterFactory from plexus-utils.


That works for plugins doing the conversion in code under our control. 
Other plugins that use external libraries or tools might be more difficult.



Note that ASCII-only sources will compile cleanly no matter the default
encoding


Most of time, but UTF-16 or EBCDIC have not even ASCII in common.


I was thinking about the default of the default, i.e. the value to be 
set in the super-POM. We certainly won't choose UTF-16 or EBCDIC for 
this global default, and as files encoded in UTF-16 or EBCDIC don't 
count as ASCII-only, my


 Martin



signature.asc
Description: OpenPGP digital signature


Re: [VOTE] POM Element for Source File Encoding

2008-04-09 Thread Jason van Zyl
All sounds fine. Just wanted you to think about the bigger picture in  
mind.


Please do the work on a branch and go through the rigor of Brian's  
example and make sure it works before you merge it into something we  
would release to users. This is not an insignificant change.


On 9-Apr-08, at 10:36 AM, Benjamin Bentmann wrote:
Make sure you consider the case where you have people developing  
the  same code base all over the world, and the possible reasoning  
of  falling back to platform default encoding. Consider the team  
spread  across the US, Russia, and China and what do they do  
normally?


This international spread of developers is in particular the case we  
have in mind. I mean, how should such a team (say the Maven  
community) deliver reliable build output if not all developers have  
agreed to use the same file encoding for the sources? Say the US  
devs would have ASCII as default encoding, the Europeans Latin-1 and  
the Asians Big5 for our nice potpourri. Even if all have agreed to  
use English for coding, you still might encounter Non-ASCII  
characters that get messed up, e.g. in javadoc comments that carry  
the name of the contributor/committer. Other developers might  
experience build failures because of encoding mismatch, at best  
other people's names are disfigured which is rather impolite.


The Eclipse folks had a similar problem [0]. The solution: Lock the  
encoding down for the entire project.


Is it possible to specify an encoding in one place that doesn't  
work somewhere else?


Yes, in theory you can have one user specify an encoding that  
another user's JVM does not support. As the class javadoc about  
Charset [1] states, only a few encodings - including Latin-1 and  
UTF-8 - are required to be supported, although the reference  
implementation from Sun supports quite more encodings [2]. However,  
I don't consider this as a practical concern. Given that support for  
UTF-8 is mandatory, there exists an encoding that can handle quite  
any character people would like to enter and Java can handle. Hence  
there exists a solution that works for everyone on the team.


I am fortunate in that I've never seen an encoding problem in Maven  
personally. In your proposal you talk about aligning the encoding   
value but my question in what cases have you found the default   
encoding not working as you don't talk about that at all in the   
proposal.


Well, choose your favorite from a search for encoding on all Maven  
2 projects in JIRA ;-)

- http://jira.codehaus.org/browse/MNG-2932
- http://jira.codehaus.org/browse/MANTTASKS-14
- http://jira.codehaus.org/browse/MTAGLIST-27
- http://jira.codehaus.org/browse/MRELEASE-302
- http://jira.codehaus.org/browse/DOXIA-103
- http://jira.codehaus.org/browse/MCHANGES-71
- (about 300 more hits)

ASCII is quite safe, but anything which requires more than those 7  
bits just needs special care.


Do you know what happens with all the tools that people use. Like  
checking into all SCMs, and what happens when people checkout on  
to  their system, editors, IDEs. I'm merely suggesting that their  
might be  a reason most things fall back to the default encoding on  
the system  because it's generally been a hard thing to coral.


In principle you're right, most of the tools are intended for usage  
with the platform's encoding. This seems to include the popular diff/ 
patch tools used by many SCMs, they have not really support for  
different encodings [3] (yet another historic design flaw, next to  
the two-digit year format).


Also, the SCMs themselves seem not to care about (file content)  
encoding yet, I have found proposals for Subversion [5] and Bazaar  
[4] but that's it. However, as far as I can tell, not knowing about  
file encoding SCMs also do not perform any conversions on the file  
content but simply assume a simple byte-to-char mapping like ASCII  
when doing EOL normalization or keyword substitution.


As for editors and IDEs: Even this tiny thing Notepad from Windows  
supports UTF-8 nowadays and I wouldn't call that an editor. Does  
anybody know about a popular editor/IDE that calls itself mature but  
does not allow to configure file encoding?



Benjamin


[0] https://bugs.eclipse.org/bugs/show_bug.cgi?id=132898
[1] http://java.sun.com/javase/6/docs/api/java/nio/charset/ 
Charset.html

[2] http://java.sun.com/j2se/1.4.2/docs/guide/intl/encoding.doc.html
[3] 
http://www.gnu.org/software/diffutils/manual/html_mono/diff.html#Internationalization
[4] 
http://bazaar-vcs.org/UnicodeSupport?action=showredirect=EncodingSupport#head-43c0111da063796da433179faaf8d065bda5c42e
[5] http://svn.haxx.se/dev/archive-2006-03/1182.shtml

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Thanks,

Jason

--
Jason van Zyl
Founder,  Apache Maven
jason at sonatype 

Re: [VOTE] POM Element for Source File Encoding

2008-04-09 Thread Benjamin Bentmann
Make sure you consider the case where you have people developing the  same 
code base all over the world, and the possible reasoning of  falling back 
to platform default encoding. Consider the team spread  across the US, 
Russia, and China and what do they do normally?


This international spread of developers is in particular the case we have in 
mind. I mean, how should such a team (say the Maven community) deliver 
reliable build output if not all developers have agreed to use the same file 
encoding for the sources? Say the US devs would have ASCII as default 
encoding, the Europeans Latin-1 and the Asians Big5 for our nice potpourri. 
Even if all have agreed to use English for coding, you still might encounter 
Non-ASCII characters that get messed up, e.g. in javadoc comments that carry 
the name of the contributor/committer. Other developers might experience 
build failures because of encoding mismatch, at best other people's names 
are disfigured which is rather impolite.


The Eclipse folks had a similar problem [0]. The solution: Lock the encoding 
down for the entire project.


Is it possible to specify an encoding in one place that doesn't work 
somewhere else?


Yes, in theory you can have one user specify an encoding that another user's 
JVM does not support. As the class javadoc about Charset [1] states, only a 
few encodings - including Latin-1 and UTF-8 - are required to be supported, 
although the reference implementation from Sun supports quite more encodings 
[2]. However, I don't consider this as a practical concern. Given that 
support for UTF-8 is mandatory, there exists an encoding that can handle 
quite any character people would like to enter and Java can handle. Hence 
there exists a solution that works for everyone on the team.


I am fortunate in that I've never seen an encoding problem in Maven 
personally. In your proposal you talk about aligning the encoding  value 
but my question in what cases have you found the default  encoding not 
working as you don't talk about that at all in the  proposal.


Well, choose your favorite from a search for encoding on all Maven 2 
projects in JIRA ;-)

- http://jira.codehaus.org/browse/MNG-2932
- http://jira.codehaus.org/browse/MANTTASKS-14
- http://jira.codehaus.org/browse/MTAGLIST-27
- http://jira.codehaus.org/browse/MRELEASE-302
- http://jira.codehaus.org/browse/DOXIA-103
- http://jira.codehaus.org/browse/MCHANGES-71
- (about 300 more hits)

ASCII is quite safe, but anything which requires more than those 7 bits just 
needs special care.


Do you know what happens with all the tools that people use. Like 
checking into all SCMs, and what happens when people checkout on to  their 
system, editors, IDEs. I'm merely suggesting that their might be  a reason 
most things fall back to the default encoding on the system  because it's 
generally been a hard thing to coral.


In principle you're right, most of the tools are intended for usage with the 
platform's encoding. This seems to include the popular diff/patch tools used 
by many SCMs, they have not really support for different encodings [3] (yet 
another historic design flaw, next to the two-digit year format).


Also, the SCMs themselves seem not to care about (file content) encoding 
yet, I have found proposals for Subversion [5] and Bazaar [4] but that's it. 
However, as far as I can tell, not knowing about file encoding SCMs also do 
not perform any conversions on the file content but simply assume a simple 
byte-to-char mapping like ASCII when doing EOL normalization or keyword 
substitution.


As for editors and IDEs: Even this tiny thing Notepad from Windows 
supports UTF-8 nowadays and I wouldn't call that an editor. Does anybody 
know about a popular editor/IDE that calls itself mature but does not allow 
to configure file encoding?



Benjamin


[0] https://bugs.eclipse.org/bugs/show_bug.cgi?id=132898
[1] http://java.sun.com/javase/6/docs/api/java/nio/charset/Charset.html
[2] http://java.sun.com/j2se/1.4.2/docs/guide/intl/encoding.doc.html
[3] 
http://www.gnu.org/software/diffutils/manual/html_mono/diff.html#Internationalization
[4] 
http://bazaar-vcs.org/UnicodeSupport?action=showredirect=EncodingSupport#head-43c0111da063796da433179faaf8d065bda5c42e
[5] http://svn.haxx.se/dev/archive-2006-03/1182.shtml 



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [VOTE] POM Element for Source File Encoding

2008-04-09 Thread Benjamin Bentmann

I see your point. Worth another vote? Or should this switch be postponed
to 2.1, trading consistency in minor version upgrades for a longer time
for these Latin1 defaults to be established?
[...]
So while I agree that a change in default either now or in the future is
ugly, it is not taboo, and I believe woth the gain.


Latin-1 being the default value was part of our proposal and not many people
complained about that nor changed their previous votes. So I believe another
vote won't deliver a different outcome.

Besides, Brian's honorable efforts to ban regressions are a good argument to
keep the already started route with Latin-1. It might not be the best
default value, but it's only a one liner to change it.


Benjamin


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [VOTE] POM Element for Source File Encoding

2008-04-09 Thread Milos Kleint
On Wed, Apr 9, 2008 at 7:36 PM, Benjamin Bentmann
[EMAIL PROTECTED] wrote:

  Make sure you consider the case where you have people developing the  same
 code base all over the world, and the possible reasoning of  falling back to
 platform default encoding. Consider the team spread  across the US, Russia,
 and China and what do they do normally?
 

  This international spread of developers is in particular the case we have
 in mind. I mean, how should such a team (say the Maven community) deliver
 reliable build output if not all developers have agreed to use the same file
 encoding for the sources? Say the US devs would have ASCII as default
 encoding, the Europeans Latin-1 and the Asians Big5 for our nice potpourri.
 Even if all have agreed to use English for coding, you still might encounter
 Non-ASCII characters that get messed up, e.g. in javadoc comments that carry
 the name of the contributor/committer. Other developers might experience
 build failures because of encoding mismatch, at best other people's names
 are disfigured which is rather impolite.

  The Eclipse folks had a similar problem [0]. The solution: Lock the
 encoding down for the entire project.\

just for the record, netbeans.org projects all use UTF-8. We have devs
in US, Czech rep, Russia and elsewhere. Netbeans allows to set default
encoding per project, for maven project I currently lookup how
maven-compiler-plugin is configured. If no configuration is in place I
fallback to platform encoding.

Encoding is not only different across countries but also across
platforms. While most Linux distributions use UTF-8, you get different
encoding based on what localized version of Windows you buy I think.
East european set is different from west europe. My Mac fallbacks to
something called MacRoman as default encoding.

Milos






  Is it possible to specify an encoding in one place that doesn't work
 somewhere else?
 

  Yes, in theory you can have one user specify an encoding that another
 user's JVM does not support. As the class javadoc about Charset [1] states,
 only a few encodings - including Latin-1 and UTF-8 - are required to be
 supported, although the reference implementation from Sun supports quite
 more encodings [2]. However, I don't consider this as a practical concern.
 Given that support for UTF-8 is mandatory, there exists an encoding that can
 handle quite any character people would like to enter and Java can handle.
 Hence there exists a solution that works for everyone on the team.



  I am fortunate in that I've never seen an encoding problem in Maven
 personally. In your proposal you talk about aligning the encoding  value but
 my question in what cases have you found the default  encoding not working
 as you don't talk about that at all in the  proposal.
 

  Well, choose your favorite from a search for encoding on all Maven 2
 projects in JIRA ;-)
  - http://jira.codehaus.org/browse/MNG-2932
  - http://jira.codehaus.org/browse/MANTTASKS-14
  - http://jira.codehaus.org/browse/MTAGLIST-27
  - http://jira.codehaus.org/browse/MRELEASE-302
  - http://jira.codehaus.org/browse/DOXIA-103
  - http://jira.codehaus.org/browse/MCHANGES-71
  - (about 300 more hits)

  ASCII is quite safe, but anything which requires more than those 7 bits
 just needs special care.



  Do you know what happens with all the tools that people use. Like checking
 into all SCMs, and what happens when people checkout on to  their system,
 editors, IDEs. I'm merely suggesting that their might be  a reason most
 things fall back to the default encoding on the system  because it's
 generally been a hard thing to coral.
 

  In principle you're right, most of the tools are intended for usage with
 the platform's encoding. This seems to include the popular diff/patch tools
 used by many SCMs, they have not really support for different encodings [3]
 (yet another historic design flaw, next to the two-digit year format).

  Also, the SCMs themselves seem not to care about (file content) encoding
 yet, I have found proposals for Subversion [5] and Bazaar [4] but that's it.
 However, as far as I can tell, not knowing about file encoding SCMs also do
 not perform any conversions on the file content but simply assume a simple
 byte-to-char mapping like ASCII when doing EOL normalization or keyword
 substitution.

  As for editors and IDEs: Even this tiny thing Notepad from Windows
 supports UTF-8 nowadays and I wouldn't call that an editor. Does anybody
 know about a popular editor/IDE that calls itself mature but does not allow
 to configure file encoding?


  Benjamin


  [0] https://bugs.eclipse.org/bugs/show_bug.cgi?id=132898
  [1] http://java.sun.com/javase/6/docs/api/java/nio/charset/Charset.html
  [2] http://java.sun.com/j2se/1.4.2/docs/guide/intl/encoding.doc.html
  [3]
 http://www.gnu.org/software/diffutils/manual/html_mono/diff.html#Internationalization
  [4]
 

Re: [VOTE] POM Element for Source File Encoding

2008-04-09 Thread Hervé BOUTEMY
Le mercredi 09 avril 2008, Benjamin Bentmann a écrit :
  I see your point. Worth another vote? Or should this switch be postponed
  to 2.1, trading consistency in minor version upgrades for a longer time
  for these Latin1 defaults to be established?
  [...]
  So while I agree that a change in default either now or in the future is
  ugly, it is not taboo, and I believe woth the gain.

 Latin-1 being the default value was part of our proposal and not many
 people complained about that nor changed their previous votes. So I believe
 another vote won't deliver a different outcome.

 Besides, Brian's honorable efforts to ban regressions are a good argument
 to keep the already started route with Latin-1. It might not be the best
 default value, but it's only a one liner to change it.
I have one argument in favor of ISO-8859-1 as default: it's the default 
encoding of properties files, as defined by JDK java.util.Properties class.
When Maven will be JDK 1.5+, we'll be able to switch to XML properties files, 
and then no problem for UTF-8 as default...



 Benjamin


 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [VOTE] POM Element for Source File Encoding

2008-04-09 Thread Hervé BOUTEMY
Le mercredi 09 avril 2008, Jason van Zyl a écrit :
 All sounds fine. Just wanted you to think about the bigger picture in
 mind.

 Please do the work on a branch and go through the rigor of Brian's
 example and make sure it works before you merge it into something we
 would release to users. This is not an insignificant change.
I created http://svn.apache.org/viewvc/maven/sandbox/branches/MNG-2216/ 
with javadoc and jxr plugins branches to test the change, and sample use 
case.

Isn't it sufficient?

Hervé


 On 9-Apr-08, at 10:36 AM, Benjamin Bentmann wrote:
  Make sure you consider the case where you have people developing
  the  same code base all over the world, and the possible reasoning
  of  falling back to platform default encoding. Consider the team
  spread  across the US, Russia, and China and what do they do
  normally?
 
  This international spread of developers is in particular the case we
  have in mind. I mean, how should such a team (say the Maven
  community) deliver reliable build output if not all developers have
  agreed to use the same file encoding for the sources? Say the US
  devs would have ASCII as default encoding, the Europeans Latin-1 and
  the Asians Big5 for our nice potpourri. Even if all have agreed to
  use English for coding, you still might encounter Non-ASCII
  characters that get messed up, e.g. in javadoc comments that carry
  the name of the contributor/committer. Other developers might
  experience build failures because of encoding mismatch, at best
  other people's names are disfigured which is rather impolite.
 
  The Eclipse folks had a similar problem [0]. The solution: Lock the
  encoding down for the entire project.
 
  Is it possible to specify an encoding in one place that doesn't
  work somewhere else?
 
  Yes, in theory you can have one user specify an encoding that
  another user's JVM does not support. As the class javadoc about
  Charset [1] states, only a few encodings - including Latin-1 and
  UTF-8 - are required to be supported, although the reference
  implementation from Sun supports quite more encodings [2]. However,
  I don't consider this as a practical concern. Given that support for
  UTF-8 is mandatory, there exists an encoding that can handle quite
  any character people would like to enter and Java can handle. Hence
  there exists a solution that works for everyone on the team.
 
  I am fortunate in that I've never seen an encoding problem in Maven
  personally. In your proposal you talk about aligning the encoding
  value but my question in what cases have you found the default
  encoding not working as you don't talk about that at all in the
  proposal.
 
  Well, choose your favorite from a search for encoding on all Maven
  2 projects in JIRA ;-)
  - http://jira.codehaus.org/browse/MNG-2932
  - http://jira.codehaus.org/browse/MANTTASKS-14
  - http://jira.codehaus.org/browse/MTAGLIST-27
  - http://jira.codehaus.org/browse/MRELEASE-302
  - http://jira.codehaus.org/browse/DOXIA-103
  - http://jira.codehaus.org/browse/MCHANGES-71
  - (about 300 more hits)
 
  ASCII is quite safe, but anything which requires more than those 7
  bits just needs special care.
 
  Do you know what happens with all the tools that people use. Like
  checking into all SCMs, and what happens when people checkout on
  to  their system, editors, IDEs. I'm merely suggesting that their
  might be  a reason most things fall back to the default encoding on
  the system  because it's generally been a hard thing to coral.
 
  In principle you're right, most of the tools are intended for usage
  with the platform's encoding. This seems to include the popular diff/
  patch tools used by many SCMs, they have not really support for
  different encodings [3] (yet another historic design flaw, next to
  the two-digit year format).
 
  Also, the SCMs themselves seem not to care about (file content)
  encoding yet, I have found proposals for Subversion [5] and Bazaar
  [4] but that's it. However, as far as I can tell, not knowing about
  file encoding SCMs also do not perform any conversions on the file
  content but simply assume a simple byte-to-char mapping like ASCII
  when doing EOL normalization or keyword substitution.
 
  As for editors and IDEs: Even this tiny thing Notepad from Windows
  supports UTF-8 nowadays and I wouldn't call that an editor. Does
  anybody know about a popular editor/IDE that calls itself mature but
  does not allow to configure file encoding?
 
 
  Benjamin
 
 
  [0] https://bugs.eclipse.org/bugs/show_bug.cgi?id=132898
  [1] http://java.sun.com/javase/6/docs/api/java/nio/charset/
  Charset.html
  [2] http://java.sun.com/j2se/1.4.2/docs/guide/intl/encoding.doc.html
  [3]
  http://www.gnu.org/software/diffutils/manual/html_mono/diff.html#Internat
 ionalization [4]
  http://bazaar-vcs.org/UnicodeSupport?action=showredirect=EncodingSupport
 #head-43c0111da063796da433179faaf8d065bda5c42e [5]
  http://svn.haxx.se/dev/archive-2006-03/1182.shtml
 
  

Re: [VOTE] POM Element for Source File Encoding

2008-04-08 Thread Benjamin Bentmann

Paul Benedict wrote:

My only concern is that the encoding kind of assumes one kind of source
file.


We are well aware that different kind of text files may use different
encodings. A simple example is using UTF-8 for Java source files and Latin-1
for properties files.

However, the primary goal of the proposal is to replace the default encoding
defined by the JVM (platform-dependent) with a value defined by the POM
(platform-independent).

Hence, we started off with a single default value. The emphasis lies on
*default*, i.e. the proposed POM property/element is not intended as the
final means to configure the employed file encoding throughout the entire
project. It is just a value plugins can use to initialize their
configuration in case the user did not explicitly specify an encoding.


I am never in a position to have multiple encodings on my projects


And I would argue that not too few people follow the same approach.
Otherwise I can hardly understand why users did not already complain about
those plugins don't provide an encoding parameter at all yet. Besides, not
every IDE allows users to configure different file encodings in a single
project so this seems really the major use case.


but I suppose if you're compiling many differrent types of sources, people
would want to tie the source to the extension type.


A file extension is just one method to distinguish files, another one is
context of use. I believe that having the possibility to configure file
encoding on a per plugin basis is good enough to capture different types of
files.

If really someday the need to setup encodings per file extension arises, we 
can think more closely about that. But even then, I wouldn't like to write 
something like this in my POM to lock down the encoding for every file 
extension that might hang around in the project:

 fileEncodings
   fileEncoding
 extensionstxt,java,groovy,aj,bsh,apt,.../extensions
 nameUTF-8/name
   fileEncoding
 /fileEncodings
I would want to have a single default value to catch the major case and this 
default value should in no case depend on my JVM. So I'm back on 
${project.build.sourceEncoding}.



Benjamin


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [VOTE] POM Element for Source File Encoding

2008-04-08 Thread Benjamin Bentmann

Jason van Zyl wrote:

Would being able to detect the encoding help with making this less
complicated. Something JChardet?


I'm not really sure what you meant to say. JChardet is a library that 
performs a best *guess* on file encoding by peeking at a byte stream. We 
don't want to base our builds on heuristics, don't we?



Benjamin 



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [VOTE] POM Element for Source File Encoding

2008-04-08 Thread Benjamin Bentmann

Hervé Boutemy wrote:

this one is more tricky, even if the change in pom.xml is a simple
addition of
an element... Don't really know how to handle this without breaking things
for Maven 2.0 when an artifact with this addition is deployed to a
repository.


Handling POM additions is a more general concern and not really the point of
our proposal. For Maven 2.0.x, adding a normal property
 properties
project.build.sourceEncoding.../project.build.sourceEncoding
 /properties
to the super POM won't hurt the model validation for 4.0.0. For now, the
simple question to answer is will the element by named like proposed? Once
we get consensus about this name, we can continue to patch the plugins to
use this property for the parameters, knowing that it will be
forward-compatible with Maven 2.1.

For Maven 2.1, a new model version will be introduced. Users that choose to
employ this version will always experience build failures with Maven 2.0.x
due to the failed model validation. Again, this is nothing specific to our
proposal about sourceEncoding. We just added another element to list of
required POM additions:
- custom profile activators
- site directory
- plugin management for reporting
- ...


The only risk is that the property chosen,
${project.build.sourceEncoding},
makes user think to a new element projectbuildsourceEncoding in the
pom


Yes, we will have to properly document this just like for the new import
scope.


Benjamin


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [VOTE] POM Element for Source File Encoding

2008-04-08 Thread Jason van Zyl


On 8-Apr-08, at 1:09 AM, Benjamin Bentmann wrote:

Jason van Zyl wrote:

Would being able to detect the encoding help with making this less
complicated. Something JChardet?


I'm not really sure what you meant to say. JChardet is a library  
that performs a best *guess* on file encoding by peeking at a byte  
stream. We don't want to base our builds on heuristics, don't we?




If it's right most of the time, and it saves the user from having to  
know or worry about it then yes I would use it.




Benjamin

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Thanks,

Jason

--
Jason van Zyl
Founder,  Apache Maven
jason at sonatype dot com
--

We all have problems. How we deal with them is a measure of our worth.

-- Unknown 





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [VOTE] POM Element for Source File Encoding

2008-04-08 Thread Benjamin Bentmann

Jason van Zyl wrote:

If it's right most of the time, and it saves the user from having to  know
or worry about it then yes I would use it.


Could you elaborate this a little more. Say we start easy and have a build
with just about 100 Java source files. Do you suggest to peek at each of
them before passing them to a tool like javac or just a subset and how
should this subset be determined? What should be done when the charset
detection reports different encodings for the set of files to process? Will
the charset detection happen over and over again for each plugin (javac,
javadoc, jxr)? What do you consider most of time, telling the various
ISO-8859 families apart is not really easy. My impression is that usage of
JChardet will significantly increase code complexity without giving me a
solid build.

Also, I believe it's a bad idea to free users from worrying about the
encoding. This would be similar to the doubtful magic the JRE provides with
its default encoding: It encourages developers to ignore the encoding issue,
leading to platform-dependent behavior. Platform-dependent Java code is a
bad practice and Maven, as far as I heard, aims at promoting best practices.
File encoding is a parameter affecting your build output just like the
source/target settings used for the compiler and hence should be explicitly
controlled.

As we talk about it: What is the agreed file encoding for the Maven sources
(MNGSITE-46)?


Benjamin


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [VOTE] POM Element for Source File Encoding

2008-04-08 Thread Milos Kleint
+1 on Benjamin's objections to detection.
It will slow down the build (possibly significantly) while providing
little added value.

Milos

On Tue, Apr 8, 2008 at 8:27 PM, Benjamin Bentmann
[EMAIL PROTECTED] wrote:
 Jason van Zyl wrote:

  If it's right most of the time, and it saves the user from having to  know
  or worry about it then yes I would use it.
 

  Could you elaborate this a little more. Say we start easy and have a build
  with just about 100 Java source files. Do you suggest to peek at each of
  them before passing them to a tool like javac or just a subset and how
  should this subset be determined? What should be done when the charset
  detection reports different encodings for the set of files to process? Will
  the charset detection happen over and over again for each plugin (javac,
  javadoc, jxr)? What do you consider most of time, telling the various
  ISO-8859 families apart is not really easy. My impression is that usage of
  JChardet will significantly increase code complexity without giving me a
  solid build.

  Also, I believe it's a bad idea to free users from worrying about the
  encoding. This would be similar to the doubtful magic the JRE provides with
  its default encoding: It encourages developers to ignore the encoding
 issue,
  leading to platform-dependent behavior. Platform-dependent Java code is a
  bad practice and Maven, as far as I heard, aims at promoting best
 practices.
  File encoding is a parameter affecting your build output just like the
  source/target settings used for the compiler and hence should be explicitly
  controlled.

  As we talk about it: What is the agreed file encoding for the Maven sources
  (MNGSITE-46)?




  Benjamin


  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [VOTE] POM Element for Source File Encoding

2008-04-08 Thread Jason van Zyl


On 8-Apr-08, at 11:27 AM, Benjamin Bentmann wrote:

Jason van Zyl wrote:
If it's right most of the time, and it saves the user from having  
to  know

or worry about it then yes I would use it.


Could you elaborate this a little more. Say we start easy and have a  
build
with just about 100 Java source files. Do you suggest to peek at  
each of

them before passing them to a tool like javac or just a subset and how
should this subset be determined?


It would be reasonable to assume the detection could be based on a  
subset. For an organization on one project you could reasonable assume  
the same encoding. That  would not be the case in an open source  
project as tools would vary.



What should be done when the charset
detection reports different encodings for the set of files to process?


What happens when the encoding is different then what is stated? Same  
problem really, in how to deal with the actual versus declared.



Will
the charset detection happen over and over again for each plugin  
(javac,
javadoc, jxr)? What do you consider most of time, telling the  
various
ISO-8859 families apart is not really easy. My impression is that  
usage of
JChardet will significantly increase code complexity without giving  
me a

solid build.


That would depend on what kinds of problems can arise if things are  
not consistent.





Also, I believe it's a bad idea to free users from worrying about the
encoding.


You have to deal with the very real possibility no one is going to set  
it, not know what is, and report issues related to encoding even if  
the whole system works.


I'm all for literal and declarative. In practice this does not happen  
all the time. I also didn't say use one over the other, but the  
detection may help in cases where it's not stated. The JChardet  
library was created for a reason, and this looks like one of them.


For the system you are proposing there would be touch points at which  
you would look for encoding parameters. If those values are not state  
you will need a strategy to detect or you will never be able to  
support any encoding alignment in older versions of Maven without the  
encoding parameterization.



This would be similar to the doubtful magic the JRE provides with
its default encoding: It encourages developers to ignore the  
encoding issue,
leading to platform-dependent behavior. Platform-dependent Java code  
is a
bad practice and Maven, as far as I heard, aims at promoting best  
practices.


Of course it is, but that doesn't negate that fact people don't  
necessarily follow best practices. But you are


1) going to need to deal with versions of Maven that don't support  
this encoding parameterization, and

2) you're going to have to deal with the case where it's stated wrong

We should know combinations of encoding parameter that will work  
together and if they aren't stated, or stated wrong it's better to  
provide some fallback instead of just dying.




File encoding is a parameter affecting your build output just like the
source/target settings used for the compiler and hence should be  
explicitly

controlled.



Absolutely, but look at all the questions on the mailing list that  
expect many of these things to just be detected. People using Java 1.5  
just expect you to be able to compile 1.5 code. That's not the case.  
Users in this case expect the right thing to happen.
I'm willing to bet you if you asked the average user about encoding,  
they would have no clue and wonder why it wasn't detected.


It was a suggestion based on experience of typical users.




As we talk about it: What is the agreed file encoding for the Maven  
sources

(MNGSITE-46)?


Benjamin


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Thanks,

Jason

--
Jason van Zyl
Founder,  Apache Maven
jason at sonatype dot com
--

We all have problems. How we deal with them is a measure of our worth.

-- Unknown 





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [VOTE] POM Element for Source File Encoding

2008-04-08 Thread Jason van Zyl


On 8-Apr-08, at 11:11 AM, Milos Kleint wrote:

+1 on Benjamin's objections to detection.
It will slow down the build (possibly significantly) while providing
little added value.


Possibly, but you're guessing.

Obviously checking the encoding on every file would be unwise. Trying  
to detect where it's not provided (mistakes), or can't be provided  
(not supported as an option in the model) you're going to have to do  
something. So what are you going to do in those cases?





Milos

On Tue, Apr 8, 2008 at 8:27 PM, Benjamin Bentmann
[EMAIL PROTECTED] wrote:

Jason van Zyl wrote:

If it's right most of the time, and it saves the user from having  
to  know

or worry about it then yes I would use it.



Could you elaborate this a little more. Say we start easy and have  
a build
with just about 100 Java source files. Do you suggest to peek at  
each of
them before passing them to a tool like javac or just a subset and  
how
should this subset be determined? What should be done when the  
charset
detection reports different encodings for the set of files to  
process? Will
the charset detection happen over and over again for each plugin  
(javac,
javadoc, jxr)? What do you consider most of time, telling the  
various
ISO-8859 families apart is not really easy. My impression is that  
usage of
JChardet will significantly increase code complexity without giving  
me a

solid build.

Also, I believe it's a bad idea to free users from worrying about the
encoding. This would be similar to the doubtful magic the JRE  
provides with

its default encoding: It encourages developers to ignore the encoding
issue,
leading to platform-dependent behavior. Platform-dependent Java  
code is a

bad practice and Maven, as far as I heard, aims at promoting best
practices.
File encoding is a parameter affecting your build output just like  
the
source/target settings used for the compiler and hence should be  
explicitly

controlled.

As we talk about it: What is the agreed file encoding for the Maven  
sources

(MNGSITE-46)?




Benjamin


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Thanks,

Jason

--
Jason van Zyl
Founder,  Apache Maven
jason at sonatype dot com
--

happiness is like a butterfly: the more you chase it, the more it will
elude you, but if you turn your attention to other things, it will come
and sit softly on your shoulder ...

-- Thoreau 





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [VOTE] POM Element for Source File Encoding

2008-04-08 Thread Martin von Gagern

+1 for the original proposal, if a newcomer like me is allowed to vote.

The concept with the property, which can be set with the properties 
until the model is updated, and which can be the default expression for 
affected plugins, is simply elegant.


Jason van Zyl wrote:
It would be reasonable to assume the detection could be based on a 
subset. For an organization on one project you could reasonable assume 
the same encoding. That  would not be the case in an open source project 
as tools would vary.


Suppose you have a huge source tree, mostly english ASCII, but somewhere 
in there there is a single degree sign, '\u00b0'. How would you detect 
it, short of scanning every ASCII file until you hit that one?


I support concerns here that the cost of encoding detection may in many 
cases be prohibitively high. Maven runs too slow as it is, imho. You 
could of course write an encoding detection plugin which could examine 
the code and set the required property accordingly. But enabling that by 
default feels bad to me.


What happens when the encoding is different then what is stated? Same 
problem really, in how to deal with the actual versus declared.


Up to the plugins, I guess, as it is now. No change there, only a 
central place to set defaults for all plugins. Of course you could write 
an encoding checking plugin which ensures that your sources are valid in 
the specified encoding.



My impression is that usage of
JChardet will significantly increase code complexity without giving me a
solid build.


That would depend on what kinds of problems can arise if things are not 
consistent.


There are three possible cases:
1. code agrees with setting = all right
2. code disagrees with setting, but is still valid under specified 
encoding = Mojibake
3. code is invalid under specified encoding = exception or unmappable 
character symbol, depending on context. Exception maybe handled by plugin.


By specifying ISO-8859-1 as default input encoding, there are no 
unmappable characters, avoiding case 3. All input should be readable, 
though the output generated from this might not look as expected.


It should be noted that plugins that generate code to be used by other 
plugins should have their output encoding default to the general input 
encoding, so that there are no breaks in the chain.


As Jason writes about consistency, I guess the danger of inconsistent 
input handling, as different plugins might be configured to read it 
using different charsets, is exactly the kind of inconsistency to be 
addressed by this proposal, so I'd expect more consistency after it has 
been implemented, not less.


Greetings,
 Martin von Gagern




signature.asc
Description: OpenPGP digital signature


Re: [VOTE] POM Element for Source File Encoding

2008-04-08 Thread Hervé BOUTEMY
Le mardi 08 avril 2008, Paul Benedict a écrit :
 In Commons Validator, we updated the DTD even in point releases. I don't
 see the harm in doing the same here. After all, if the POM is 4.0.0, why
 not create a 4.0.1? It sounds like Maven 2 will have a 4.1 version.

 Paul
because if you use 4.0.1 for your project, and upload your component to a 
repository, everybody depending on your component will need to support 4.0.1 
or they'll get a failure parsing a 4.0.1 pom with their Maven runtime 
supporting only 4.0.0 pom

to support a 4.1 version, I imagine there will be some trick to implement to 
upload simultaneously the original 4.1 pom version to the repository and a 
generated 4.0.0 for compatibility with Maven 2.0.x

Hervé


 On Mon, Apr 7, 2008 at 6:03 PM, Jason van Zyl [EMAIL PROTECTED] wrote:
  On 7-Apr-08, at 3:58 PM, Jason van Zyl wrote:
   Would being able to detect the encoding help with making this less
   complicated. Something JChardet?
 
  Sorry, something like JCharet:
 
  http://jchardet.sourceforge.net/
 
   On 7-Apr-08, at 2:31 PM, Hervé BOUTEMY wrote:
Le dimanche 06 avril 2008, Jason van Zyl a écrit :
 I specifically meant the core changes, but I would still
 recommending
 what Milos did which was to create branches for a few of the
 affected
 plugins to try it all together.
   
ok, I created
http://svn.apache.org/viewvc/maven/sandbox/branches/MNG-2216/
with javadoc and jxr plugins branches to test the change, and sample
use
case.
   
 Most certainly to test new elements in
   
 the POM you need to use a branch because we still don't have a
 strategy for dealing with model changes.
   
this one is more tricky, even if the change in pom.xml is a simple
addition of
an element... Don't really know how to handle this without breaking
things
for Maven 2.0 when an artifact with this addition is deployed to a
repository.
   
 If plugins can be changed, used with the existing versions of Maven
   
 with no disruption then do it in-situ.
   
No problem here, no disruption, as proven by the test.
The only risk is that the property chosen,
${project.build.sourceEncoding},
makes user think to a new element projectbuildsourceEncoding in
the
pom, but we still don't know how we will implement it: we bet on a
solution
we don't have currently.
   
Hervé
   
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
  
   Thanks,
  
   Jason
  
   --
   Jason van Zyl
   Founder,  Apache Maven
   jason at sonatype dot com
   --
  
   A man enjoys his work when he understands the whole and when he
   is responsible for the quality of the whole
  
   -- Christopher Alexander, A Pattern Language
  
  
  
   -
   To unsubscribe, e-mail: [EMAIL PROTECTED]
   For additional commands, e-mail: [EMAIL PROTECTED]
 
  Thanks,
 
  Jason
 
  --
  Jason van Zyl
  Founder,  Apache Maven
  jason at sonatype dot com
  --
 
  Simplex sigillum veri. (Simplicity is the seal of truth.)
 
 
 
 
 
  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [VOTE] POM Element for Source File Encoding

2008-04-08 Thread Hervé BOUTEMY
Le mardi 08 avril 2008, Martin von Gagern a écrit :
 +1 for the original proposal, if a newcomer like me is allowed to vote.

 The concept with the property, which can be set with the properties
 until the model is updated, and which can be the default expression for
 affected plugins, is simply elegant.
+1

 I support concerns here that the cost of encoding detection may in many
 cases be prohibitively high. Maven runs too slow as it is, imho. You
 could of course write an encoding detection plugin which could examine
 the code and set the required property accordingly. But enabling that by
 default feels bad to me.
+1
encoding detection, guessing encoding, is unreliable by nature
Why not in a browser, where:
- encoding can change on every page
- a user looks at the rendered characters, sees a problem easily and fixes the 
value by simply trying another value and seeing if it is better

But embedded in Maven, where encoding is not so volatile and the consequences 
of a bad guess will be more subtle (for example as the classes compiled will 
be run and display bad output), I find it a really bad idea.

 It should be noted that plugins that generate code to be used by other
 plugins should have their output encoding default to the general input
 encoding, so that there are no breaks in the chain.
it's noted in the proposal, in the list of affected plugins (modello, for 
example, which generates Java source code)

 As Jason writes about consistency, I guess the danger of inconsistent
 input handling, as different plugins might be configured to read it
 using different charsets, is exactly the kind of inconsistency to be
 addressed by this proposal, so I'd expect more consistency after it has
 been implemented, not less.
+1

until now, few people did care about encoding for non XML sources, and it 
worked: yes, that's the magic of platform encoding (the drawback is 
reproducibility)

IMHO, the best hint for a user choose his encoding when the default ISO-8859-1 
isn't a good valuie for him, is displaying platform encoding (in mvn -v 
output for example): it's easy, reliable, and corresponds to the value he 
would have got before the change

Hervé

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [VOTE] POM Element for Source File Encoding

2008-04-08 Thread Benjamin Bentmann

Martin von Gagern wrote:

if a newcomer like me is allowed to vote.


The more people participate in a discussion, the more likely is the result
to match public consensus rather than individual's preferences.


Suppose you have a huge source tree, mostly english ASCII, but somewhere
in there there is a single degree sign, '\u00b0'. How would you detect
it, short of scanning every ASCII file until you hit that one?


Exactly, if the automatic guessing should have any chance to deliver the
proper result, it's doomed to scan all the files and this is additional I/O.
Please remember, I/O is one of the most expensive operations in terms of
time, in particular with a Maven build being quite sequential.


You could of course write an encoding detection plugin which could examine
the code and set the required property accordingly.


Personally, I don't see the use case for that. If there are really users out
there that don't know what file encoding they are using when writing up
their sources, they are most probably happy with the proposed default value
of Latin-1. Alternatively, this encoding detection plugin could be as simple
as printing out the Java system property ${file.encoding} which obviously
worked well enough for the user.

For those users that know about file encoding, it won't be a problem to
specify this in the POM. In particular, those users will not fail to specify
the right encoding, unlike a dumb machine which merely tests whether a
particular byte stream obeys the syntax rules of an encoding.


Benjamin


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [VOTE] POM Element for Source File Encoding

2008-04-08 Thread Benjamin Bentmann

Jason van Zyl wrote:

Possibly, but you're guessing.


Guessing about how much it will be slower, yes, guessing that it will be
slower, no. Additional work, additional time. Wouldn't you agree? Then the 
question becomes, is it worth to take this overhead, or how much benefit do 
you expect from the encoding guess over the simple default value.



Obviously checking the encoding on every file would be unwise.


As Martin nicely illustrated, you would exactly have to do this. Otherwise,
you could simply shortcut the detection to ASCII because that's what you see
most of the time. The characters that require the proper encoding are in the
minority. My passion for this proposal is not about works most of the
time, I would like to see works always.


Trying to detect where it's not provided (mistakes)


We proposed to set a default value in the super POM such that the encoding
will always be specified. To handle Maven 2.0.9-, we further proposed that
each plugin consistently falls back to this agreed default value in case it
doesn't get a value from the POM. Is there a case I am missing?


or can't be provided (not supported as an option in the model) you're
going to have to do something. So what are you going to do in those cases?


I am not sure what you mean when referring to model. Are you referring to
a plugin that is currently not aware of the encoding issue, i.e. simply uses
the JVM's default value and does not provide a configuration parameter to
the user? For this case, we should simply fix this plugin and release a new
version of it to deliver consistently high quality software.


Benjamin


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [VOTE] POM Element for Source File Encoding

2008-04-08 Thread Paul Benedict
Herve,

Just a proposal: Maven could loosen its parsing rules when it detects
versions greater than it is configured to accept. This can't be without
limits, of course, perhaps in the range of a single point release: 4.0 =
4.0.x  4.1. But perhaps within the 4.0.x series, it would accept undeclared
elements instead of strict parsing against the XSD. So if a 4.0.0 parser is
given a 4.0.1 POM, it must at least match 4.0.0 but also accepts undeclared
elements.

Forward compatibility would be nice.

Paul

On Tue, Apr 8, 2008 at 4:16 PM, Hervé BOUTEMY [EMAIL PROTECTED] wrote:

 Le mardi 08 avril 2008, Paul Benedict a écrit :
  In Commons Validator, we updated the DTD even in point releases. I don't
  see the harm in doing the same here. After all, if the POM is 4.0.0, why
  not create a 4.0.1? It sounds like Maven 2.1 will have a 4.1 version.
 
  Paul



 because if you use 4.0.1 for your project, and upload your component to a
 repository, everybody depending on your component will need to support
 4.0.1
 or they'll get a failure parsing a 4.0.1 pom with their Maven runtime
 supporting only 4.0.0 pom

 to support a 4.1 version, I imagine there will be some trick to implement
 to
 upload simultaneously the original 4.1 pom version to the repository and a
 generated 4.0.0 for compatibility with Maven 2.0.x

 Hervé




Re: [VOTE] POM Element for Source File Encoding

2008-04-08 Thread Benjamin Bentmann

Jason van Zyl wrote:
What happens when the encoding is different then what is stated? Same 
problem really, in how to deal with the actual versus declared.


If the declared encoding does not match the actual one, I simply call this 
an user error. Either he explicitly set the wrong value or forgot to 
overwrite the default value. With regard to user errors, my general 
suggestion is to fail the build. This unforgiving attitude should not be 
that unfamilar to users: It has been chosen for a popular format like XML 
which is also employed by Maven for a few files.


That would depend on what kinds of problems can arise if things are  not 
consistent.


The problems depend on the encodings: If one feeds Latin-1 into an UTF-8 
decoder, you most likely encouter invalid byte sequences, making the decoder 
fail. That's my favorite case as it clearly shows the user something is 
wrong and needs his attention. The other case is worse because more subtle: 
Feeding UTF-8 into a Latin-1 decoder will pass but produces output that only 
a human can tell being garbage by closing analyzing the few Non-ASCII 
characters.


You have to deal with the very real possibility no one is going to set 
it, not know what is, and report issues related to encoding even if  the 
whole system works.


I don't think that lack of knowledge is a state that should be supported. 
Java is an international platform, designed for platform-independence (more 
or less). If developers don't know about file encoding, they are likely 
producing bad code. Therefore, I am easy to say: Have users report issues 
about encoding and let's tell them how to do it properly, i.e. teach them 
another best practice. Then, maybe some day, we won't ever face programs 
that were written without file encoding in mind ;-)


For the system you are proposing there would be touch points at which  you 
would look for encoding parameters. If those values are not state  you 
will need a strategy to detect or you will never be able to  support any 
encoding alignment in older versions of Maven without the  encoding 
parameterization.


Hm, maybe we talk a lot just because we didn't illustrate our proposal 
properly: A key point is that there will *always* be a specific encoding 
value. The proposal expects all affected plugins to fall back to Latin-1 (or 
whatever, just a fixed value) if they don't get an explicit setting from the 
POM. I.e. once a user employs a particular version of a plugin, he can 
immediately tell which encoding it will use to process text files. In other 
words, he can immediately tell whether the plugin will behave correctly. In 
contrast, if we followed your suggestion with encoding guessing, the user 
would have to try out the plugin and verify that is guessed correctly. The 
encoding parameterization is primarily a task for the individual plugins and 
not bound to a Maven version. Having a dedicated POM property/element is 
just sugar, not a requirement. The important aspect is unification of 
encoding handling in the plugins.


Of course it is, but that doesn't negate that fact people don't 
necessarily follow best practices.


That's right. But I believe we have to distinguish bad practice and mistake. 
What people call good practice might be controversial, but stating that a 
Latin-1 encoded file should be read using UTF-8 is in general just wrong and 
leaves no room for discussion. Hence I believe that Maven has all right to 
fail the build and report an error if a user does not properly setup the 
file encoding, forcing users to fix the error.


Absolutely, but look at all the questions on the mailing list that  expect 
many of these things to just be detected.


I don't want to upset those users but I believe that not every request is 
justified and can be rejected if only properly backed by a reasonable 
argument. Until somebody shows me a feasible and *reliable* algo to tell 
ISO-8859-1 and ISO-8859-15 apart, I don't want the dumb machine to start 
guessing. I, and I hope all the other users, aim for a correct build and if 
the machine cannot derive the required parameters, it is a user's duty to 
specify the proper values. Besides, this is nothing that really hurts much, 
add the line to your POM and be fine for the rest of your life.



Benjamin 



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [VOTE] POM Element for Source File Encoding

2008-04-08 Thread Benjamin Bentmann
IMHO, the best hint for a user choose his encoding when the default 
ISO-8859-1

isn't a good valuie for him, is displaying platform encoding (in mvn -v
output for example): it's easy, reliable, and corresponds to the value he
would have got before the change


+1, just created MNG-3509 for this.


Benjamin 



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [VOTE] POM Element for Source File Encoding

2008-04-08 Thread Jason van Zyl


On 8-Apr-08, at 4:09 PM, Benjamin Bentmann wrote:

Jason van Zyl wrote:
What happens when the encoding is different then what is stated?  
Same problem really, in how to deal with the actual versus declared.


If the declared encoding does not match the actual one, I simply  
call this an user error.


Make sure you consider the case where you have people developing the  
same code base all over the world, and the possible reasoning of  
falling back to platform default encoding. Consider the team spread  
across the US, Russia, and China and what do they do normally?


Is it possible to specify an encoding in one place that doesn't work  
somewhere else?


I am fortunate in that I've never seen an encoding problem in Maven  
personally. In your proposal you talk about aligning the encoding  
value but my question in what cases have you found the default  
encoding not working as you don't talk about that at all in the  
proposal.


Do you know what happens with all the tools that people use. Like  
checking into all SCMs, and what happens when people checkout on to  
their system, editors, IDEs. I'm merely suggesting that their might be  
a reason most things fall back to the default encoding on the system  
because it's generally been a hard thing to coral.


Either he explicitly set the wrong value or forgot to overwrite the  
default value. With regard to user errors, my general suggestion is  
to fail the build. This unforgiving attitude should not be that  
unfamilar to users: It has been chosen for a popular format like XML  
which is also employed by Maven for a few files.


That would depend on what kinds of problems can arise if things  
are  not consistent.


The problems depend on the encodings: If one feeds Latin-1 into an  
UTF-8 decoder, you most likely encouter invalid byte sequences,  
making the decoder fail. That's my favorite case as it clearly shows  
the user something is wrong and needs his attention. The other case  
is worse because more subtle: Feeding UTF-8 into a Latin-1 decoder  
will pass but produces output that only a human can tell being  
garbage by closing analyzing the few Non-ASCII characters.


You have to deal with the very real possibility no one is going to  
set it, not know what is, and report issues related to encoding  
even if  the whole system works.


I don't think that lack of knowledge is a state that should be  
supported. Java is an international platform, designed for platform- 
independence (more or less). If developers don't know about file  
encoding, they are likely producing bad code. Therefore, I am easy  
to say: Have users report issues about encoding and let's tell them  
how to do it properly, i.e. teach them another best practice. Then,  
maybe some day, we won't ever face programs that were written  
without file encoding in mind ;-)


For the system you are proposing there would be touch points at  
which  you would look for encoding parameters. If those values are  
not state  you will need a strategy to detect or you will never be  
able to  support any encoding alignment in older versions of Maven  
without the  encoding parameterization.


Hm, maybe we talk a lot just because we didn't illustrate our  
proposal properly: A key point is that there will *always* be a  
specific encoding value. The proposal expects all affected plugins  
to fall back to Latin-1 (or whatever, just a fixed value) if they  
don't get an explicit setting from the POM. I.e. once a user employs  
a particular version of a plugin, he can immediately tell which  
encoding it will use to process text files. In other words, he can  
immediately tell whether the plugin will behave correctly. In  
contrast, if we followed your suggestion with encoding guessing, the  
user would have to try out the plugin and verify that is guessed  
correctly. The encoding parameterization is primarily a task for the  
individual plugins and not bound to a Maven version. Having a  
dedicated POM property/element is just sugar, not a requirement. The  
important aspect is unification of encoding handling in the plugins.


Of course it is, but that doesn't negate that fact people don't  
necessarily follow best practices.


That's right. But I believe we have to distinguish bad practice and  
mistake. What people call good practice might be controversial, but  
stating that a Latin-1 encoded file should be read using UTF-8 is in  
general just wrong and leaves no room for discussion. Hence I  
believe that Maven has all right to fail the build and report an  
error if a user does not properly setup the file encoding, forcing  
users to fix the error.


Absolutely, but look at all the questions on the mailing list that   
expect many of these things to just be detected.


I don't want to upset those users but I believe that not every  
request is justified and can be rejected if only properly backed by  
a reasonable argument. Until somebody shows me a feasible and  
*reliable* algo to 

Re: [VOTE] POM Element for Source File Encoding

2008-04-07 Thread Benjamin Bentmann

Please clarify the proposal. When you say source files, you mean things
like Java files not POM files?


Yes, source file is meant to refer to a plain text file that does not have
an encoding declaration or similar like XML. XML is fine, it's ugly to parse
but provides the user with means to specify the used file encoding. Our
proposal is about all the other text files that rely on external
configuration to transfer the used file encoding. As such, the proposal is
not about POM, FML, XDOC or whatever XML file you can imagine.


Benjamin


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [VOTE] POM Element for Source File Encoding

2008-04-07 Thread Benjamin Bentmann

I'd like to know if this could also be achieved via toolchains.


As Hervé already tried to explain, these two proposals have not too much in
common. To my understanding, the toolchain proposal aims at providing a
facade on a user's development kit (native tools, boot class path, etc.)
such that projects can be build using a specific JDK regardless of the JRE
running Maven. I don't see any relation between
a) the selection of a native tool from a user's system
b) the configuration of file encoding for project source files

Indeed, I consider this two orthogonal concerns. Each of the combinations

  | JRE 1.4  | JRE 1.5 | JRE 1.6 |   ...
 -+--+-+-+-
 UTF-8|X |X|X|
 Latin-1  |X |X|X|
...   |X |X|X|

represents a valid use case for a project configuration.

What both proposals share is the intention to address these tasks via a
*central* configuration in the POM, i.e. configure target JRE and file
encoding once, not repeatedly for each plugin.

If you feel that toolchains and file encoding fit nicely together and don't
violate separation of concerns, please sketch your thoughts.


Benjamin


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [VOTE] POM Element for Source File Encoding

2008-04-07 Thread Paul Benedict
My only concern is that the encoding kind of assumes one kind of source
file. I am never in a position to have multiple encodings on my projects,
but I suppose if you're compiling many differrent types of sources, people
would want to tie the source to the extension type.

Paul

On Mon, Apr 7, 2008 at 10:10 AM, Benjamin Bentmann 
[EMAIL PROTECTED] wrote:

 I'd like to know if this could also be achieved via toolchains.
 

 As Hervé already tried to explain, these two proposals have not too much
 in
 common. To my understanding, the toolchain proposal aims at providing a
 facade on a user's development kit (native tools, boot class path, etc.)
 such that projects can be build using a specific JDK regardless of the JRE
 running Maven. I don't see any relation between
 a) the selection of a native tool from a user's system
 b) the configuration of file encoding for project source files

 Indeed, I consider this two orthogonal concerns. Each of the combinations

  | JRE 1.4  | JRE 1.5 | JRE 1.6 |   ...
  -+--+-+-+-
  UTF-8|X |X|X|
  Latin-1  |X |X|X|
...   |X |X|X|

 represents a valid use case for a project configuration.

 What both proposals share is the intention to address these tasks via a
 *central* configuration in the POM, i.e. configure target JRE and file
 encoding once, not repeatedly for each plugin.

 If you feel that toolchains and file encoding fit nicely together and
 don't
 violate separation of concerns, please sketch your thoughts.


 Benjamin



 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]




Re: [VOTE] POM Element for Source File Encoding

2008-04-07 Thread VELO
+1

On Sat, Apr 5, 2008 at 2:20 PM, Hervé BOUTEMY [EMAIL PROTECTED] wrote:
 Hi,

  Since the discussion on the list about Maven and encoding 2 weeks ago,
  Benjamin and I worked on a proposal to have:
  1. a central point of configuration of sources encoding, to be used by each
  and every plugin,
  2. a default value set to ISO-8859-1 (instead of platform encoding) to have
  build reproducibility by default

  The full proposal is here:
  
 http://docs.codehaus.org/display/MAVENUSER/POM+Element+for+Source+File+Encoding

  As you'll see, we've already found 8 Apache plugins to change, and 4 Codehaus
  ones. Before starting the code modifications, we need everybody to agree on
  the proposal (and complete it if you know other places to change).

  The vote will be open for 72 hours.

  [ ] +1
  [ ] +0
  [ ] -1

  Here is my +1

  Regards,

  Hervé

  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [VOTE] POM Element for Source File Encoding

2008-04-07 Thread Asgeir S. Nilsen
2008/4/5, Hervé BOUTEMY [EMAIL PROTECTED]:
 Hi,

  Since the discussion on the list about Maven and encoding 2 weeks ago,
  Benjamin and I worked on a proposal to have:
  1. a central point of configuration of sources encoding, to be used by each
  and every plugin,
  2. a default value set to ISO-8859-1 (instead of platform encoding) to have
  build reproducibility by default

Out of curiosity, why would you go for 8859-1 and not UTF-8 or
US-ASCII?  I would think it would be safer to either support any
extended character or no extended characters, and not something
halfway there?

Asgeir


Re: [VOTE] POM Element for Source File Encoding

2008-04-07 Thread Hervé BOUTEMY
Le lundi 07 avril 2008, Asgeir S. Nilsen a écrit :
 2008/4/5, Hervé BOUTEMY [EMAIL PROTECTED]:
  Hi,
 
   Since the discussion on the list about Maven and encoding 2 weeks ago,
   Benjamin and I worked on a proposal to have:
   1. a central point of configuration of sources encoding, to be used by
  each and every plugin,
   2. a default value set to ISO-8859-1 (instead of platform encoding) to
  have build reproducibility by default

 Out of curiosity, why would you go for 8859-1 and not UTF-8 or
 US-ASCII?  I would think it would be safer to either support any
 extended character or no extended characters, and not something
 halfway there?

 Asgeir
US-ASCII: why limit to ASCII only when ISO-8859-1 is a superset?
UTF-8: seems interesting in the first thought, but:
- there are already plugins having ISO-8859-1 as default value
- you can have invalid byte combinations for UTF-8, causing failures

ISO-8859-1 seems the best compromise.

Hervé

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [VOTE] POM Element for Source File Encoding

2008-04-07 Thread Hervé BOUTEMY
Le dimanche 06 avril 2008, Jason van Zyl a écrit :
 I specifically meant the core changes, but I would still recommending
 what Milos did which was to create branches for a few of the affected
 plugins to try it all together.
ok, I created http://svn.apache.org/viewvc/maven/sandbox/branches/MNG-2216/ 
with javadoc and jxr plugins branches to test the change, and sample use 
case.

 Most certainly to test new elements in 
 the POM you need to use a branch because we still don't have a
 strategy for dealing with model changes.
this one is more tricky, even if the change in pom.xml is a simple addition of 
an element... Don't really know how to handle this without breaking things 
for Maven 2.0 when an artifact with this addition is deployed to a 
repository.

 If plugins can be changed, used with the existing versions of Maven
 with no disruption then do it in-situ.
No problem here, no disruption, as proven by the test.
The only risk is that the property chosen, ${project.build.sourceEncoding}, 
makes user think to a new element projectbuildsourceEncoding in the 
pom, but we still don't know how we will implement it: we bet on a solution 
we don't have currently.

Hervé

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [VOTE] POM Element for Source File Encoding

2008-04-07 Thread Jason van Zyl
Would being able to detect the encoding help with making this less  
complicated. Something JChardet?


On 7-Apr-08, at 2:31 PM, Hervé BOUTEMY wrote:

Le dimanche 06 avril 2008, Jason van Zyl a écrit :

I specifically meant the core changes, but I would still recommending
what Milos did which was to create branches for a few of the affected
plugins to try it all together.

ok, I created http://svn.apache.org/viewvc/maven/sandbox/branches/MNG-2216/
with javadoc and jxr plugins branches to test the change, and sample  
use

case.


Most certainly to test new elements in
the POM you need to use a branch because we still don't have a
strategy for dealing with model changes.
this one is more tricky, even if the change in pom.xml is a simple  
addition of
an element... Don't really know how to handle this without breaking  
things

for Maven 2.0 when an artifact with this addition is deployed to a
repository.


If plugins can be changed, used with the existing versions of Maven
with no disruption then do it in-situ.

No problem here, no disruption, as proven by the test.
The only risk is that the property chosen, $ 
{project.build.sourceEncoding},
makes user think to a new element projectbuildsourceEncoding  
in the
pom, but we still don't know how we will implement it: we bet on a  
solution

we don't have currently.

Hervé

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Thanks,

Jason

--
Jason van Zyl
Founder,  Apache Maven
jason at sonatype dot com
--

A man enjoys his work when he understands the whole and when he
is responsible for the quality of the whole

-- Christopher Alexander, A Pattern Language 





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [VOTE] POM Element for Source File Encoding

2008-04-07 Thread Jason van Zyl


On 7-Apr-08, at 3:58 PM, Jason van Zyl wrote:
Would being able to detect the encoding help with making this less  
complicated. Something JChardet?




Sorry, something like JCharet:

http://jchardet.sourceforge.net/


On 7-Apr-08, at 2:31 PM, Hervé BOUTEMY wrote:

Le dimanche 06 avril 2008, Jason van Zyl a écrit :
I specifically meant the core changes, but I would still  
recommending
what Milos did which was to create branches for a few of the  
affected

plugins to try it all together.

ok, I created http://svn.apache.org/viewvc/maven/sandbox/branches/MNG-2216/
with javadoc and jxr plugins branches to test the change, and  
sample use

case.


Most certainly to test new elements in
the POM you need to use a branch because we still don't have a
strategy for dealing with model changes.
this one is more tricky, even if the change in pom.xml is a simple  
addition of
an element... Don't really know how to handle this without breaking  
things

for Maven 2.0 when an artifact with this addition is deployed to a
repository.


If plugins can be changed, used with the existing versions of Maven
with no disruption then do it in-situ.

No problem here, no disruption, as proven by the test.
The only risk is that the property chosen, $ 
{project.build.sourceEncoding},
makes user think to a new element projectbuildsourceEncoding  
in the
pom, but we still don't know how we will implement it: we bet on a  
solution

we don't have currently.

Hervé

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Thanks,

Jason

--
Jason van Zyl
Founder,  Apache Maven
jason at sonatype dot com
--

A man enjoys his work when he understands the whole and when he
is responsible for the quality of the whole

-- Christopher Alexander, A Pattern Language



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Thanks,

Jason

--
Jason van Zyl
Founder,  Apache Maven
jason at sonatype dot com
--

Simplex sigillum veri. (Simplicity is the seal of truth.)




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [VOTE] POM Element for Source File Encoding

2008-04-07 Thread Paul Benedict
In Commons Validator, we updated the DTD even in point releases. I don't see
the harm in doing the same here. After all, if the POM is 4.0.0, why not
create a 4.0.1? It sounds like Maven 2 will have a 4.1 version.

Paul

On Mon, Apr 7, 2008 at 6:03 PM, Jason van Zyl [EMAIL PROTECTED] wrote:


 On 7-Apr-08, at 3:58 PM, Jason van Zyl wrote:

  Would being able to detect the encoding help with making this less
  complicated. Something JChardet?
 
 
 Sorry, something like JCharet:

 http://jchardet.sourceforge.net/


  On 7-Apr-08, at 2:31 PM, Hervé BOUTEMY wrote:
 
   Le dimanche 06 avril 2008, Jason van Zyl a écrit :
  
I specifically meant the core changes, but I would still
recommending
what Milos did which was to create branches for a few of the
affected
plugins to try it all together.
   
   ok, I created
   http://svn.apache.org/viewvc/maven/sandbox/branches/MNG-2216/
   with javadoc and jxr plugins branches to test the change, and sample
   use
   case.
  
Most certainly to test new elements in
the POM you need to use a branch because we still don't have a
strategy for dealing with model changes.
   
   this one is more tricky, even if the change in pom.xml is a simple
   addition of
   an element... Don't really know how to handle this without breaking
   things
   for Maven 2.0 when an artifact with this addition is deployed to a
   repository.
  
If plugins can be changed, used with the existing versions of Maven
with no disruption then do it in-situ.
   
   No problem here, no disruption, as proven by the test.
   The only risk is that the property chosen,
   ${project.build.sourceEncoding},
   makes user think to a new element projectbuildsourceEncoding in
   the
   pom, but we still don't know how we will implement it: we bet on a
   solution
   we don't have currently.
  
   Hervé
  
   -
   To unsubscribe, e-mail: [EMAIL PROTECTED]
   For additional commands, e-mail: [EMAIL PROTECTED]
  
  
  Thanks,
 
  Jason
 
  --
  Jason van Zyl
  Founder,  Apache Maven
  jason at sonatype dot com
  --
 
  A man enjoys his work when he understands the whole and when he
  is responsible for the quality of the whole
 
  -- Christopher Alexander, A Pattern Language
 
 
 
  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]
 
 
 Thanks,

 Jason

 --
 Jason van Zyl
 Founder,  Apache Maven
 jason at sonatype dot com
 --

 Simplex sigillum veri. (Simplicity is the seal of truth.)





 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]




Re: [VOTE] POM Element for Source File Encoding

2008-04-06 Thread Dennis Lundberg

+1

Hervé BOUTEMY wrote:

Hi,

Since the discussion on the list about Maven and encoding 2 weeks ago, 
Benjamin and I worked on a proposal to have:
1. a central point of configuration of sources encoding, to be used by each 
and every plugin,
2. a default value set to ISO-8859-1 (instead of platform encoding) to have 
build reproducibility by default


The full proposal is here:
http://docs.codehaus.org/display/MAVENUSER/POM+Element+for+Source+File+Encoding

As you'll see, we've already found 8 Apache plugins to change, and 4 Codehaus 
ones. Before starting the code modifications, we need everybody to agree on 
the proposal (and complete it if you know other places to change).


The vote will be open for 72 hours.

[ ] +1
[ ] +0
[ ] -1

Here is my +1

Regards,

Hervé

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





--
Dennis Lundberg

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [VOTE] POM Element for Source File Encoding

2008-04-06 Thread Paul Benedict
Please clarify the proposal. When you say source files, you mean things
like Java files not POM files?

Paul

On Sun, Apr 6, 2008 at 2:56 PM, Dennis Lundberg [EMAIL PROTECTED] wrote:

 +1

 Hervé BOUTEMY wrote:

  Hi,
 
  Since the discussion on the list about Maven and encoding 2 weeks ago,
  Benjamin and I worked on a proposal to have:
  1. a central point of configuration of sources encoding, to be used by
  each and every plugin,
  2. a default value set to ISO-8859-1 (instead of platform encoding) to
  have build reproducibility by default
 
  The full proposal is here:
 
  http://docs.codehaus.org/display/MAVENUSER/POM+Element+for+Source+File+Encoding
 
  As you'll see, we've already found 8 Apache plugins to change, and 4
  Codehaus ones. Before starting the code modifications, we need everybody to
  agree on the proposal (and complete it if you know other places to change).
 
  The vote will be open for 72 hours.
 
  [ ] +1
  [ ] +0
  [ ] -1
 
  Here is my +1
 
  Regards,
 
  Hervé
 
  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]
 
 
 

 --
 Dennis Lundberg

 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]




Re: [VOTE] POM Element for Source File Encoding

2008-04-06 Thread Paul Benedict
+1 .. I'd like to know if this could also be achieved via toolchains.

Hervé BOUTEMY wrote:
 
   Hi,
  
   Since the discussion on the list about Maven and encoding 2 weeks ago,
   Benjamin and I worked on a proposal to have:
   1. a central point of configuration of sources encoding, to be used by
   each and every plugin,
   2. a default value set to ISO-8859-1 (instead of platform encoding) to
   have build reproducibility by default
  
   The full proposal is here:
  
   http://docs.codehaus.org/display/MAVENUSER/POM+Element+for+Source+File+Encoding
  
   As you'll see, we've already found 8 Apache plugins to change, and 4
   Codehaus ones. Before starting the code modifications, we need everybody 
   to
   agree on the proposal (and complete it if you know other places to 
   change).
  
   The vote will be open for 72 hours.
  
   [ ] +1
   [ ] +0
   [ ] -1
  
   Here is my +1
  
   Regards,
  
   Hervé
  
   -
   To unsubscribe, e-mail: [EMAIL PROTECTED]
   For additional commands, e-mail: [EMAIL PROTECTED]
  
  
  
 
  --
  Dennis Lundberg
 
  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]
 
 



[VOTE] POM Element for Source File Encoding

2008-04-05 Thread Hervé BOUTEMY
Hi,

Since the discussion on the list about Maven and encoding 2 weeks ago, 
Benjamin and I worked on a proposal to have:
1. a central point of configuration of sources encoding, to be used by each 
and every plugin,
2. a default value set to ISO-8859-1 (instead of platform encoding) to have 
build reproducibility by default

The full proposal is here:
http://docs.codehaus.org/display/MAVENUSER/POM+Element+for+Source+File+Encoding

As you'll see, we've already found 8 Apache plugins to change, and 4 Codehaus 
ones. Before starting the code modifications, we need everybody to agree on 
the proposal (and complete it if you know other places to change).

The vote will be open for 72 hours.

[ ] +1
[ ] +0
[ ] -1

Here is my +1

Regards,

Hervé

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [VOTE] POM Element for Source File Encoding

2008-04-05 Thread nicolas de loof
+1

Is there any overlap with the tool chain proposal ?

Nico



2008/4/5, Hervé BOUTEMY [EMAIL PROTECTED]:

 Hi,

 Since the discussion on the list about Maven and encoding 2 weeks ago,
 Benjamin and I worked on a proposal to have:
 1. a central point of configuration of sources encoding, to be used by
 each
 and every plugin,
 2. a default value set to ISO-8859-1 (instead of platform encoding) to
 have
 build reproducibility by default

 The full proposal is here:

 http://docs.codehaus.org/display/MAVENUSER/POM+Element+for+Source+File+Encoding

 As you'll see, we've already found 8 Apache plugins to change, and 4
 Codehaus
 ones. Before starting the code modifications, we need everybody to agree
 on
 the proposal (and complete it if you know other places to change).

 The vote will be open for 72 hours.

 [ ] +1
 [ ] +0
 [ ] -1

 Here is my +1

 Regards,

 Hervé

 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]




Re: [VOTE] POM Element for Source File Encoding

2008-04-05 Thread Benjamin Bentmann

+1


Benjamin



Hervé BOUTEMY wrote:

Hi,

Since the discussion on the list about Maven and encoding 2 weeks ago,
Benjamin and I worked on a proposal to have:
1. a central point of configuration of sources encoding, to be used by 
each

and every plugin,
2. a default value set to ISO-8859-1 (instead of platform encoding) to 
have

build reproducibility by default

The full proposal is here:
http://docs.codehaus.org/display/MAVENUSER/POM+Element+for+Source+File+Encoding

As you'll see, we've already found 8 Apache plugins to change, and 4 
Codehaus
ones. Before starting the code modifications, we need everybody to agree 
on

the proposal (and complete it if you know other places to change).

The vote will be open for 72 hours.

[ ] +1
[ ] +0
[ ] -1

Here is my +1

Regards,

Hervé



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [VOTE] POM Element for Source File Encoding

2008-04-05 Thread Tomasz Pik
On Sat, Apr 5, 2008 at 7:20 PM, Hervé BOUTEMY [EMAIL PROTECTED] wrote:

[...]

  The full proposal is here:
  
 http://docs.codehaus.org/display/MAVENUSER/POM+Element+for+Source+File+Encoding

Non-binding +1

Regards,
Tomek

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [VOTE] POM Element for Source File Encoding

2008-04-05 Thread Jason van Zyl
You don't need a 72 hour vote, I would try it in a branch first and  
then get people to look at it.


It's a good idea, just don't do it on trunk directly so that we have  
the before and after to compare.


On 5-Apr-08, at 10:20 AM, Hervé BOUTEMY wrote:

Hi,

Since the discussion on the list about Maven and encoding 2 weeks ago,
Benjamin and I worked on a proposal to have:
1. a central point of configuration of sources encoding, to be used  
by each

and every plugin,
2. a default value set to ISO-8859-1 (instead of platform encoding)  
to have

build reproducibility by default

The full proposal is here:
http://docs.codehaus.org/display/MAVENUSER/POM+Element+for+Source+File+Encoding

As you'll see, we've already found 8 Apache plugins to change, and 4  
Codehaus
ones. Before starting the code modifications, we need everybody to  
agree on

the proposal (and complete it if you know other places to change).

The vote will be open for 72 hours.

[ ] +1
[ ] +0
[ ] -1

Here is my +1

Regards,

Hervé

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Thanks,

Jason

--
Jason van Zyl
Founder,  Apache Maven
jason at sonatype dot com
--

A party which is not afraid of letting culture,
business, and welfare go to ruin completely can
be omnipotent for a while.

-- Jakob Burckhardt 





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [VOTE] POM Element for Source File Encoding

2008-04-05 Thread Hervé BOUTEMY
Le samedi 05 avril 2008, nicolas de loof a écrit :
 +1

 Is there any overlap with the tool chain proposal ?
as I understand the tool chain proposal, no overlap at all
the tool chain is here to let a central place to configure tools on every 
developer environment (like where is javac 1.5)

source file encoding is not tied to a developer's environment: it's precisely 
the contrary, it has to be configured in the project and the project only 
(hence the problem with default value being platform encoding, which is 
implicitely dependent on developer's environment)


 Nico

 2008/4/5, Hervé BOUTEMY [EMAIL PROTECTED]:
  Hi,
 
  Since the discussion on the list about Maven and encoding 2 weeks ago,
  Benjamin and I worked on a proposal to have:
  1. a central point of configuration of sources encoding, to be used by
  each
  and every plugin,
  2. a default value set to ISO-8859-1 (instead of platform encoding) to
  have
  build reproducibility by default
 
  The full proposal is here:
 
  http://docs.codehaus.org/display/MAVENUSER/POM+Element+for+Source+File+En
 coding
 
  As you'll see, we've already found 8 Apache plugins to change, and 4
  Codehaus
  ones. Before starting the code modifications, we need everybody to agree
  on
  the proposal (and complete it if you know other places to change).
 
  The vote will be open for 72 hours.
 
  [ ] +1
  [ ] +0
  [ ] -1
 
  Here is my +1
 
  Regards,
 
  Hervé
 
  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [VOTE] POM Element for Source File Encoding

2008-04-05 Thread Benjamin Bentmann

Jason van Zyl wrote:

You don't need a 72 hour vote, I would try it in a branch first and  then
get people to look at it.


Just wondering: If I would fill in JIRAs for each affected plugin to request
a) adding an encoding parameter if not already existent
b) making this parameter default to Latin-1
would we start branches on the plugins for each of these issues?

I mean this proposal is not about a revolutionary new feature, it's merely
the attempt to create a guideline for consistent encoding handling in the
various source processing plugins. More precisely, we're seeking consensus
that
a) the core team will eventually introduce a new POM element for this in
   Maven 2.1, named project.build.sourceEncoding or whatever we agree upon
b) in the meantime, Maven 2.0.x will define an equally name property for
   this in its super POM
c) it's OK to have Latin-1 as default encoding rather than the platform
   encoding

Also, this is not going to be a code change that plops out one day as a huge
merge back into trunk. Rather, it's an incremental process where the
required improvements to plugin X can be made independently of the
development on plugin Y.

For example, MPLUGIN-101 and MINVOKER-30 already have patches for this topic
pending. Is it really expected to open a branch, apply the patches to the
branch and merge back (the same day) instead of applying them directly to
trunk? Do I underestimate this?


Benjamin


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [VOTE] POM Element for Source File Encoding

2008-04-05 Thread Jason van Zyl


On 5-Apr-08, at 3:13 PM, Benjamin Bentmann wrote:

Jason van Zyl wrote:
You don't need a 72 hour vote, I would try it in a branch first  
and  then

get people to look at it.


Just wondering: If I would fill in JIRAs for each affected plugin to  
request

a) adding an encoding parameter if not already existent
b) making this parameter default to Latin-1
would we start branches on the plugins for each of these issues?

I mean this proposal is not about a revolutionary new feature, it's  
merely
the attempt to create a guideline for consistent encoding handling  
in the
various source processing plugins. More precisely, we're seeking  
consensus

that
a) the core team will eventually introduce a new POM element for  
this in
  Maven 2.1, named project.build.sourceEncoding or whatever we agree  
upon


I specifically meant the core changes, but I would still recommending  
what Milos did which was to create branches for a few of the affected  
plugins to try it all together. Most certainly to test new elements in  
the POM you need to use a branch because we still don't have a  
strategy for dealing with model changes.


If plugins can be changed, used with the existing versions of Maven  
with no disruption then do it in-situ.




b) in the meantime, Maven 2.0.x will define an equally name property  
for

  this in its super POM
c) it's OK to have Latin-1 as default encoding rather than the  
platform

  encoding

Also, this is not going to be a code change that plops out one day  
as a huge

merge back into trunk. Rather, it's an incremental process where the
required improvements to plugin X can be made independently of the
development on plugin Y.

For example, MPLUGIN-101 and MINVOKER-30 already have patches for  
this topic
pending. Is it really expected to open a branch, apply the patches  
to the
branch and merge back (the same day) instead of applying them  
directly to

trunk? Do I underestimate this?


Benjamin


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Thanks,

Jason

--
Jason van Zyl
Founder,  Apache Maven
jason at sonatype dot com
--

the course of true love never did run smooth ...

-- Shakespeare 





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]