Re: perl unicode support

Daniel B. Wed, 04 Apr 2007 18:45:31 -0700

Rich Felker wrote:
> 
> On Sat, Mar 31, 2007 at 07:44:39PM -0400, Daniel B. wrote:
> > Rich Felker wrote:
> > > Again, software which does not handle corner cases correctly is crap.
> >
> > Why are you confusing "special-case" with "corner case"?
> >
> > I never said that software shouldn't handle corner cases such as illegal
> > UTF-8 sequences.
> >
> > I meant that an editor that handles illegal UTF-8 sequences other than
> > by simply rejecting the edit request is a bit if a special case compared
> > to general-purpose software, say a XML processor, for which some
> > specification requires (or recommends?) that the processor ignore or
> > reject any illegal sequences.  The software isn't failing to handle the
> > corner case; it is handling it--by explicitly rejecting it.
> 
> It is a corner case!


We seem to be having a communication problem, but I don't quite see
what the cause is.

I agree that it is a corner case.  However, (seemingly) clearly, what 
you wrote indicates you think I don't or wouldn't. 

(I was arguing that handling the corner case by doing something other
than simply rejecting the illegal UTF-8 sequences was a bit of a 
special case, just like, say, handling ill-formed XML is not something
a general XML processor (parser) has to do (it rejects it) but _is_ 
something a typical XML editor would want to do.

And to be clear, I'm not arguing that an editor should _not_ be a 
special case (that is, not arguing that it shouldn't be careful to avoid
changing the file unintentially).  I was only pointing out that it _is_ 
a special case (because whatever UTF-8 issues we were talking about 
many message ago seem top apply differently to special-case tools (e.g., 
a general text editor) vs. general tools (e.g., HTTP POST receiver code).

Maybe at first I thought you were talking about a UTF-8-_only_ editor.)



> Itâ??s simply not acceptable for opening a file and resaving it to not
> yield exactly the same, byte-for-byte identical file, because it can
> lead either to horrible data corruption or inability to edit when your
> file has somehow gotten malformed data into it. 

(Yes, I agree.)


 
...
> > You said you're talking about a text editor, that reads bytes, displays
> > legal UTF-8 sequences as the characters they represent in UTF-8, doesn't
> > reject other UTF-8-illegal bytes, and does something with those bytes.
> >
> > What does it do with such a byte?  It seems you were taking about
> > mapping it to some character to display it.  Are you talking about
> > something else, such as displaying the hex value of the byte?
> 
> Yes. 

Roger.


Daniel

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Re: perl unicode support

Reply via email to