Rich Felker wrote: > > On Sat, Mar 31, 2007 at 07:44:39PM -0400, Daniel B. wrote: > > Rich Felker wrote: > > > Again, software which does not handle corner cases correctly is crap. > > > > Why are you confusing "special-case" with "corner case"? > > > > I never said that software shouldn't handle corner cases such as illegal > > UTF-8 sequences. > > > > I meant that an editor that handles illegal UTF-8 sequences other than > > by simply rejecting the edit request is a bit if a special case compared > > to general-purpose software, say a XML processor, for which some > > specification requires (or recommends?) that the processor ignore or > > reject any illegal sequences. The software isn't failing to handle the > > corner case; it is handling it--by explicitly rejecting it. > > It is a corner case!
We seem to be having a communication problem, but I don't quite see what the cause is. I agree that it is a corner case. However, (seemingly) clearly, what you wrote indicates you think I don't or wouldn't. (I was arguing that handling the corner case by doing something other than simply rejecting the illegal UTF-8 sequences was a bit of a special case, just like, say, handling ill-formed XML is not something a general XML processor (parser) has to do (it rejects it) but _is_ something a typical XML editor would want to do. And to be clear, I'm not arguing that an editor should _not_ be a special case (that is, not arguing that it shouldn't be careful to avoid changing the file unintentially). I was only pointing out that it _is_ a special case (because whatever UTF-8 issues we were talking about many message ago seem top apply differently to special-case tools (e.g., a general text editor) vs. general tools (e.g., HTTP POST receiver code). Maybe at first I thought you were talking about a UTF-8-_only_ editor.) > Itâ??s simply not acceptable for opening a file and resaving it to not > yield exactly the same, byte-for-byte identical file, because it can > lead either to horrible data corruption or inability to edit when your > file has somehow gotten malformed data into it. (Yes, I agree.) ... > > You said you're talking about a text editor, that reads bytes, displays > > legal UTF-8 sequences as the characters they represent in UTF-8, doesn't > > reject other UTF-8-illegal bytes, and does something with those bytes. > > > > What does it do with such a byte? It seems you were taking about > > mapping it to some character to display it. Are you talking about > > something else, such as displaying the hex value of the byte? > > Yes. Roger. Daniel -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/