Wow another issue caught by random testing! On Fri, May 14, 2010 at 1:42 AM, Robert Muir <[email protected]> wrote: > the problem is a logic bug (e.g. i have no clue how to really fix > except to switch over to a UTF-8 sort order). > > in converting automaton to utf-8/32, and trying to emulate the utf-16 > term dictionary order, the byte transition ranges (although sorted in > utf-16 order) are themselves in utf-8/32 order: e.g. a byte range of > 0xe0-0xef is problematic during enumeration since the 0xee-0xef > component should be "sorted last" in utf-16 order.
Ugh. I suppose we could forcefully split such edges? (We'd have to fix reduce to not consolidate them). Or just cutover to UTF8 order for trunk. > i know a workaround until we switch over, but its gonna cause wasted > seeks at the least (its just wrong). This is the FIXME you committed right? Ie always seek... Mike --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
