Re: [VOTE] POM Element for Source File Encoding

Jason van Zyl Tue, 08 Apr 2008 16:34:49 -0700


On 8-Apr-08, at 4:09 PM, Benjamin Bentmann wrote:

Jason van Zyl wrote:
What happens when the encoding is different then what is stated?Same problem really, in how to deal with the actual versus declared.
If the declared encoding does not match the actual one, I simplycall this an user error.

Make sure you consider the case where you have people developing thesame code base all over the world, and the possible reasoning offalling back to platform default encoding. Consider the team spreadacross the US, Russia, and China and what do they do normally?

Is it possible to specify an encoding in one place that doesn't worksomewhere else?

I am fortunate in that I've never seen an encoding problem in Mavenpersonally. In your proposal you talk about aligning the encodingvalue but my question in what cases have you found the defaultencoding not working as you don't talk about that at all in theproposal.

Do you know what happens with all the tools that people use. Likechecking into all SCMs, and what happens when people checkout on totheir system, editors, IDEs. I'm merely suggesting that their might bea reason most things fall back to the default encoding on the systembecause it's generally been a hard thing to coral.

Either he explicitly set the wrong value or forgot to overwrite thedefault value. With regard to user errors, my general suggestion isto fail the build. This unforgiving attitude should not be thatunfamilar to users: It has been chosen for a popular format like XMLwhich is also employed by Maven for a few files.
That would depend on what kinds of problems can arise if thingsare not consistent.
The problems depend on the encodings: If one feeds Latin-1 into anUTF-8 decoder, you most likely encouter invalid byte sequences,making the decoder fail. That's my favorite case as it clearly showsthe user something is wrong and needs his attention. The other caseis worse because more subtle: Feeding UTF-8 into a Latin-1 decoderwill pass but produces output that only a human can tell beinggarbage by closing analyzing the few Non-ASCII characters.
You have to deal with the very real possibility no one is going toset it, not know what is, and report issues related to encodingeven if the whole system works.
I don't think that lack of knowledge is a state that should besupported. Java is an international platform, designed for platform-independence (more or less). If developers don't know about fileencoding, they are likely producing bad code. Therefore, I am easyto say: Have users report issues about encoding and let's tell themhow to do it properly, i.e. teach them another best practice. Then,maybe some day, we won't ever face programs that were writtenwithout file encoding in mind ;-)
For the system you are proposing there would be touch points atwhich you would look for encoding parameters. If those values arenot state you will need a strategy to detect or you will never beable to support any encoding alignment in older versions of Mavenwithout the encoding parameterization.
Hm, maybe we talk a lot just because we didn't illustrate ourproposal properly: A key point is that there will *always* be aspecific encoding value. The proposal expects all affected pluginsto fall back to Latin-1 (or whatever, just a fixed value) if theydon't get an explicit setting from the POM. I.e. once a user employsa particular version of a plugin, he can immediately tell whichencoding it will use to process text files. In other words, he canimmediately tell whether the plugin will behave correctly. Incontrast, if we followed your suggestion with encoding guessing, theuser would have to try out the plugin and verify that is guessedcorrectly. The encoding parameterization is primarily a task for theindividual plugins and not bound to a Maven version. Having adedicated POM property/element is just sugar, not a requirement. Theimportant aspect is unification of encoding handling in the plugins.
Of course it is, but that doesn't negate that fact people don'tnecessarily follow best practices.
That's right. But I believe we have to distinguish bad practice andmistake. What people call good practice might be controversial, butstating that a Latin-1 encoded file should be read using UTF-8 is ingeneral just wrong and leaves no room for discussion. Hence Ibelieve that Maven has all right to fail the build and report anerror if a user does not properly setup the file encoding, forcingusers to fix the error.
Absolutely, but look at all the questions on the mailing list thatexpect many of these things to just be detected.
I don't want to upset those users but I believe that not everyrequest is justified and can be rejected if only properly backed bya reasonable argument. Until somebody shows me a feasible and*reliable* algo to tell ISO-8859-1 and ISO-8859-15 apart, I don'twant the dumb machine to start guessing. I, and I hope all the otherusers, aim for a correct build and if the machine cannot derive therequired parameters, it is a user's duty to specify the propervalues. Besides, this is nothing that really hurts much, add theline to your POM and be fine for the rest of your life.
Benjamin

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Thanks,

Jason

----------------------------------------------------------
Jason van Zyl
Founder,  Apache Maven
jason at sonatype dot com
----------------------------------------------------------

Simplex sigillum veri. (Simplicity is the seal of truth.)




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [VOTE] POM Element for Source File Encoding

Reply via email to