Aristotle Pagaltzis wrote:
* Tatsuhiko Miyagawa <[EMAIL PROTECTED]> [2008-03-19 07:20]:
Some modules like XML::LibXML adds UTF-8 flags regardless of if
the characters to handle are composed of latin-1 range (like
Encode::decode_utf8 instead of utf8::decode), and that's pretty
much realistic and sane approach I think.

Yes. If the flag is to have any use at all, then it has to have
the semantic of distinguishing character vs byte strings.

I agree with Bill that the plugin trying to decode already
utf-8 flagged string doesn't make any sense, but furthermore, I
wonder under which circumstance the plugin tries to decode
already-utf8-flagged strings.

I'd say that's the root problem.

Yes; and that’s exactly what Jon said.
There are a number of ways that incoming data could already be decoded: environment, perl switches or pragmata, ideally every application would do as Jon proposes and ensure that nothing decodes the string before the plugin sees it. But checking the flag before decoding is at worst harmless and at best prevents data corruption: it would prevent already-decoded strings becoming deformed, decode encoded UTF-8 (or whatever) strings and leave unflagged ASCII strings alone, whether or not decode had already be attempted.

Perhaps the best approach would be to warn and not decode when flagged data is seen, that way the data should never be deformed and the author can see that something else is decoding too early and they can fix it.

Matt


_______________________________________________
List: [email protected]
Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
Searchable archive: http://www.mail-archive.com/[email protected]/
Dev site: http://dev.catalyst.perl.org/

Reply via email to