On Sun, 22 May 2011 09:40:19 -0400, Vladimir Panteleev <[email protected]> wrote:

On Sun, 22 May 2011 11:56:33 +0300, KennyTM~ <[email protected]> wrote:

Nice tool! I tried to use it to reduce bug 6044, but encountered 2 problems:

1. DustMite will load _all_ files, including the _binary_ ones, which
    is seldom in valid UTF-8 encoding, and that causes a UtfException to
    be thrown from 'save.dump' because 'e.header' contains those invalid
    character. (BTW, Andrei, is it really necessary to include the whole
    invalid string in the exception?!)

The real question here is why would appender validate UTF when appending a string to a string? This reduces the complexity of whatever a GC allocation COULD be to linear, so for large strings it might be slower than appending to an array. The following comment is in Phobos, but I don't understand it:

// note, we disable this branch for appending one type of char to
         // another because we can't trust the length portion.

Essentially, this comment is about how you have to decode and then encode anytime one changes the character type. i.e. the fact that 1 dchar != 1 wchar != 1 char. So a 5 dchar string, might require 20 chars to represent. As for performance, using appender is never slower than ~=, as it uses essentially the same code. Furthermore, you actually can not make appender use linear allocation, even when you are doing a transcoding operation, as it always grows by max(needed, newCapacity() ), which gives it a roughly an exponential growth rate. Also, if you're concerned about appender performance, I'd recommend using the patch from Issue 5813.

Reply via email to