On 8-Apr-08, at 4:09 PM, Benjamin Bentmann wrote:
Jason van Zyl wrote:
What happens when the encoding is different then what is stated?
Same problem really, in how to deal with the actual versus declared.
If the declared encoding does not match the actual one, I simply
call this an user error.
Make sure you consider the case where you have people developing the
same code base all over the world, and the possible reasoning of
falling back to platform default encoding. Consider the team spread
across the US, Russia, and China and what do they do normally?
Is it possible to specify an encoding in one place that doesn't work
somewhere else?
I am fortunate in that I've never seen an encoding problem in Maven
personally. In your proposal you talk about aligning the encoding
value but my question in what cases have you found the default
encoding not working as you don't talk about that at all in the
proposal.
Do you know what happens with all the tools that people use. Like
checking into all SCMs, and what happens when people checkout on to
their system, editors, IDEs. I'm merely suggesting that their might be
a reason most things fall back to the default encoding on the system
because it's generally been a hard thing to coral.
Either he explicitly set the wrong value or forgot to overwrite the
default value. With regard to user errors, my general suggestion is
to fail the build. This unforgiving attitude should not be that
unfamilar to users: It has been chosen for a popular format like XML
which is also employed by Maven for a few files.
That would depend on what kinds of problems can arise if things
are not consistent.
The problems depend on the encodings: If one feeds Latin-1 into an
UTF-8 decoder, you most likely encouter invalid byte sequences,
making the decoder fail. That's my favorite case as it clearly shows
the user something is wrong and needs his attention. The other case
is worse because more subtle: Feeding UTF-8 into a Latin-1 decoder
will pass but produces output that only a human can tell being
garbage by closing analyzing the few Non-ASCII characters.
You have to deal with the very real possibility no one is going to
set it, not know what is, and report issues related to encoding
even if the whole system works.
I don't think that lack of knowledge is a state that should be
supported. Java is an international platform, designed for platform-
independence (more or less). If developers don't know about file
encoding, they are likely producing bad code. Therefore, I am easy
to say: Have users report issues about encoding and let's tell them
how to do it properly, i.e. teach them another best practice. Then,
maybe some day, we won't ever face programs that were written
without file encoding in mind ;-)
For the system you are proposing there would be touch points at
which you would look for encoding parameters. If those values are
not state you will need a strategy to detect or you will never be
able to support any encoding alignment in older versions of Maven
without the encoding parameterization.
Hm, maybe we talk a lot just because we didn't illustrate our
proposal properly: A key point is that there will *always* be a
specific encoding value. The proposal expects all affected plugins
to fall back to Latin-1 (or whatever, just a fixed value) if they
don't get an explicit setting from the POM. I.e. once a user employs
a particular version of a plugin, he can immediately tell which
encoding it will use to process text files. In other words, he can
immediately tell whether the plugin will behave correctly. In
contrast, if we followed your suggestion with encoding guessing, the
user would have to try out the plugin and verify that is guessed
correctly. The encoding parameterization is primarily a task for the
individual plugins and not bound to a Maven version. Having a
dedicated POM property/element is just sugar, not a requirement. The
important aspect is unification of encoding handling in the plugins.
Of course it is, but that doesn't negate that fact people don't
necessarily follow best practices.
That's right. But I believe we have to distinguish bad practice and
mistake. What people call good practice might be controversial, but
stating that a Latin-1 encoded file should be read using UTF-8 is in
general just wrong and leaves no room for discussion. Hence I
believe that Maven has all right to fail the build and report an
error if a user does not properly setup the file encoding, forcing
users to fix the error.
Absolutely, but look at all the questions on the mailing list that
expect many of these things to just be detected.
I don't want to upset those users but I believe that not every
request is justified and can be rejected if only properly backed by
a reasonable argument. Until somebody shows me a feasible and
*reliable* algo to tell ISO-8859-1 and ISO-8859-15 apart, I don't
want the dumb machine to start guessing. I, and I hope all the other
users, aim for a correct build and if the machine cannot derive the
required parameters, it is a user's duty to specify the proper
values. Besides, this is nothing that really hurts much, add the
line to your POM and be fine for the rest of your life.
Benjamin
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Thanks,
Jason
----------------------------------------------------------
Jason van Zyl
Founder, Apache Maven
jason at sonatype dot com
----------------------------------------------------------
Simplex sigillum veri. (Simplicity is the seal of truth.)
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]