On 8-Apr-08, at 4:09 PM, Benjamin Bentmann wrote:
Jason van Zyl wrote:
What happens when the encoding is different then what is stated? Same problem really, in how to deal with the actual versus declared.

If the declared encoding does not match the actual one, I simply call this an user error.

Make sure you consider the case where you have people developing the same code base all over the world, and the possible reasoning of falling back to platform default encoding. Consider the team spread across the US, Russia, and China and what do they do normally?

Is it possible to specify an encoding in one place that doesn't work somewhere else?

I am fortunate in that I've never seen an encoding problem in Maven personally. In your proposal you talk about aligning the encoding value but my question in what cases have you found the default encoding not working as you don't talk about that at all in the proposal.

Do you know what happens with all the tools that people use. Like checking into all SCMs, and what happens when people checkout on to their system, editors, IDEs. I'm merely suggesting that their might be a reason most things fall back to the default encoding on the system because it's generally been a hard thing to coral.

Either he explicitly set the wrong value or forgot to overwrite the default value. With regard to user errors, my general suggestion is to fail the build. This unforgiving attitude should not be that unfamilar to users: It has been chosen for a popular format like XML which is also employed by Maven for a few files.

That would depend on what kinds of problems can arise if things are not consistent.

The problems depend on the encodings: If one feeds Latin-1 into an UTF-8 decoder, you most likely encouter invalid byte sequences, making the decoder fail. That's my favorite case as it clearly shows the user something is wrong and needs his attention. The other case is worse because more subtle: Feeding UTF-8 into a Latin-1 decoder will pass but produces output that only a human can tell being garbage by closing analyzing the few Non-ASCII characters.

You have to deal with the very real possibility no one is going to set it, not know what is, and report issues related to encoding even if the whole system works.

I don't think that lack of knowledge is a state that should be supported. Java is an international platform, designed for platform- independence (more or less). If developers don't know about file encoding, they are likely producing bad code. Therefore, I am easy to say: Have users report issues about encoding and let's tell them how to do it properly, i.e. teach them another best practice. Then, maybe some day, we won't ever face programs that were written without file encoding in mind ;-)

For the system you are proposing there would be touch points at which you would look for encoding parameters. If those values are not state you will need a strategy to detect or you will never be able to support any encoding alignment in older versions of Maven without the encoding parameterization.

Hm, maybe we talk a lot just because we didn't illustrate our proposal properly: A key point is that there will *always* be a specific encoding value. The proposal expects all affected plugins to fall back to Latin-1 (or whatever, just a fixed value) if they don't get an explicit setting from the POM. I.e. once a user employs a particular version of a plugin, he can immediately tell which encoding it will use to process text files. In other words, he can immediately tell whether the plugin will behave correctly. In contrast, if we followed your suggestion with encoding guessing, the user would have to try out the plugin and verify that is guessed correctly. The encoding parameterization is primarily a task for the individual plugins and not bound to a Maven version. Having a dedicated POM property/element is just sugar, not a requirement. The important aspect is unification of encoding handling in the plugins.

Of course it is, but that doesn't negate that fact people don't necessarily follow best practices.

That's right. But I believe we have to distinguish bad practice and mistake. What people call good practice might be controversial, but stating that a Latin-1 encoded file should be read using UTF-8 is in general just wrong and leaves no room for discussion. Hence I believe that Maven has all right to fail the build and report an error if a user does not properly setup the file encoding, forcing users to fix the error.

Absolutely, but look at all the questions on the mailing list that expect many of these things to just be detected.

I don't want to upset those users but I believe that not every request is justified and can be rejected if only properly backed by a reasonable argument. Until somebody shows me a feasible and *reliable* algo to tell ISO-8859-1 and ISO-8859-15 apart, I don't want the dumb machine to start guessing. I, and I hope all the other users, aim for a correct build and if the machine cannot derive the required parameters, it is a user's duty to specify the proper values. Besides, this is nothing that really hurts much, add the line to your POM and be fine for the rest of your life.


Benjamin

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Thanks,

Jason

----------------------------------------------------------
Jason van Zyl
Founder,  Apache Maven
jason at sonatype dot com
----------------------------------------------------------

Simplex sigillum veri. (Simplicity is the seal of truth.)




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to