On Fri, Apr 27, 2012 at 12:20:13PM +0400, Dmitry Olshansky wrote: > On 27.04.2012 1:23, H. S. Teoh wrote: [...] > >What we *really* should be doing, esp. for commonly-used functions > >like computing various lengths, is to automatically process said > >tables and encode the computation in finite-state machines that can > >then be optimized at the FSM level (there are known algos for > >generating optimal FSMs), > > FSA are based on tables so it's all runs in the circle. Only the > layout changes. Yet the speed gains of non-decoding are huge.
Yes, but hand-coded tables tend to go out of date, be prone to bugs, or are missing optimizations done by an FSA generator (e.g. a lexer generator). Collapsed FSA states, for example, can greatly reduce table size and speed things up. > codegen'd, and then optimized again at the assembly level by the > >compiler. These FSMs will operate at the native narrow string char > >type level, so that there will be no need for explicit decoding. > > > >The generation algo can then be run just once per unicode release, > >and everything will Just Work. > > > This year Unicode in D will receive a nice upgrade. > http://www.google-melange.com/gsoc/proposal/review/google/gsoc2012/dolsh/20002# > > Anyway keep me posted if you have these FSA ever come to soil your > sleep ;) [...] One area where autogenerated Unicode algos will be very useful is in normalization. Unicode normalization is non-trivial, to say the least; it involves looking up various character properties and performing mappings between them in a specified order. If we can encode this process as FSA, then we can let an automated FSA optimizer produce code that maps directly between the (non-decoded!) source string and the target (non-decoded!) normalized string. Similar things can be done for string concatenation (which requires arbitrarily-distant scanning in either direction from the joining point, though in normal use cases the distance should be very short). T -- Error: Keyboard not attached. Press F1 to continue. -- Yoon Ha Lee, CONLANG
