On Wed 28 Mar 2012 08:38, Mark H Weaver <m...@netris.org> writes: > I see that you've added a bunch of latin1-specific procedures to the > ethreads branch, which I guess is part of your optimization work.
It's for semi-textual protocols, like memcached's protocol. Although they are specified textually, it's clear that they should work byte-wise. Latin-1 is the encoding whose codepoints correspond to bytes; that's why I was using it. UTF-8 is tricky because given the current state of eports, you can't (putback-utf-char eport (get-utf8-char eport)). This is because the first bytes of the char could have been from the end of one buffer, then we completely fill the next buffer, and read the rest of the bytes; but there is no space for putback. Arguably that is a failure of eports, that they need to layer a textual buffer over the binary buffer. Dunno. > Keep in mind that if/when we switch to UTF-8, the UTF-8 functions will > become the fast paths and the latin1 ones will become slower. As part > of my UTF-8 work, one of the tedious jobs I will have to do is to find > and fix all of the places where latin1-specific procedures are used as > an optimization. Don't worry about this. I'm not touching C at all. There is nothing you would have to do. I'm very much looking forward to the day that our strings are UTF-8 internally, that will make many things much faster :) > If you insist on these optimizations, would it be possible to optimize > for ASCII instead? The nice thing about ASCII-specific procedures is > that they will be fast in both our current string representation and in > the planned UTF-8 representation as well. The problem with this is that eports would have to handle decoding errors. Latin-1 has the nice property that no byte is an invalid character. Andy -- http://wingolog.org/