On Sunday, April 08, 2012 23:36:23 Nathan M. Swan wrote: > For most of the string processing I do, I read/write text in > UTF-8 and convert it to UTF-32 for processing (with std.utf), so > I don't have to worry about encoding. Is this a good or bad > paradigm? Is there a better way to do this? What method do all of > you use? > > Just curious, NMS
It depends on what you're doing. Depending on the functions that you use and your memory requirements, UTF-8 may be faster or UTF-32 may be faster. UTF-32 has the advantage of being a random-access range, which will make it work with a number of functions that UTF-8 won't work with. But UTF-32 also takes considerably more memory (especially if most of your characters are ASCII characters), which can be a problem. I think that the most common thing is to just operate on UTF-8 unless another encoding is needed (e.g. UTF-32 is required because random-access is needed), and in plenty of cases, you end up operating on generic ranges anyway if you use range-based functions on strings and don't use std.array.array on them. You're going to have to profile your code to see whether using UTF-8 or UTF-32 primarily in your string-processing is more efficient. - Jonathan M Davis
