On 01/11/2015 11:01 AM, Karl Williamson wrote:
On 01/10/2015 11:35 PM, David E. Wheeler wrote:
On Jan 10, 2015, at 5:48 PM, Sean Burke <sbu...@cpan.org> wrote:

Helleu, Pod pals!
Short version about "Re: Assume CP1252"-- I advise: yes, assume
CP1252 where technically you were expecting Latin-1.

Thanks for chiming in, Sean.

I agree completely, go for it!

Yes:
* assume that input is CP1252 in the absence of any encoding being
declared
* assume that input is CP1252 if the declared encoding is Latin-1

As far as I know, that amicable bait-and-switch (i.e., construing
Latin-1 to actually mean the superset CP1252) means in practice that
everybody wins, and nobody loses, and DWIM prevails yet again.

Right, I vaguely remember you telling me this before. I forgot about
#2 (and the HTML 5 precedent).

I think I oppose overruling someone's =encoding line.  The reason that
1252 is effectively a superset of latin1 is because it reuses the C1
controls to mean something else, and we don't expect those controls to
actually appear in a pod document.  That is quite likely, except for
one, NEL, U+85, which is the usual line separator on some platforms,
notably os390 (that code point is the horizontal ellipsis in 1252).

It strikes me as wrong anyway to say we know better than the coder.
There needs to be a way for a coder to specify the coding and not have
that specification ignored by us.  We do not have the foresight to know
the possible circumstances where Latin1 is the correct value and 1252 is
not.  We could be wrong, and we should provide an easy workaround for
our wrongness.  The most straight forward which will lead to the least
resentment against us when we are wrong is to simply not second guess
what the coder has said.

os390 is proof that there is at least one platform that Perl runs on
where 1252 is not a superset of Latin1.  There could be special casing
for that platform.  But if we're wrong there, we could be wrong
elsewhere.  It just seems a bad idea to think we know better than the
coder.


To be clear, I think that assuming 1252 when there is no =encoding line is a good idea. But I'm leery of overriding an actual =encoding line.

Reply via email to