Re: Of possible interest: fast UTF8 validation

Patrick Schluter via Digitalmars-d Thu, 17 May 2018 22:35:59 -0700

On Thursday, 17 May 2018 at 23:16:03 UTC, H. S. Teoh wrote:

On Thu, May 17, 2018 at 07:13:23PM +0000, Patrick Schluter viaDigitalmars-d wrote: [...]
[...]
Yes. Imagine if we standardized on a header-based stringencoding, and we wanted to implement a substring function overa string that contains multiple segments of differentlanguages. Instead of a cheap slicing over the string, you'dneed to scan the string or otherwise keep track of whichsegment the start/end of the substring lies in, allocate memoryto insert headers so that the segments are properlyinterpreted, etc.. It would be an implementational nightmare,and an unavoidable performance hit (you'd have to copy dataevery time you take a substring), and the @nogc guys would beup in arms.
[...]

That's what rtf with code pages was essentially. I'm happy thatwe got rid of it and that they were replaced by xml, even ifMicrosoft's document xml being a bloated, ridiculous mess, it'sstill an order of magnitude less problematic than rtf (I mean atthe text encoding level).

Re: Of possible interest: fast UTF8 validation

Reply via email to