On Thu, May 19, 2011 at 6:16 AM, Dawid Weiss <dawid.we...@cs.put.poznan.pl> wrote: >> We should add this (lookup by value, when value is guaranteed to >> monotonically increase as the key increases) to our core FST APIs? >> It's generically useful in many places ;) I'll open an issue. > > The data structure itself should sort of "build itself" if you create > an FST with increasing integers because the "shared suffix" should be > pushed towards the root anyway, so the only thing would be to correct > values on all outgoing arcs (they need to contain the count of leaves > on the subtree).... but then, this may be tricky if arc values are > vcoded... I'd have to think how to do this.
I think, if we add ord as an output to the FST, then it builds everything we need? Ie no further data structures should be needed? Maybe I'm confused :) >> While FST w/ lookup-by-monotonic-value would work here, I would be >> worried about the per hit of that representation vs what > > There are actually two things: > > a) performance; you need to descend in the automaton and some > bookkeeping to maintain the count of nodes; this adds overhead, > > b) size; the procedure for storing/ calculating perfect hashes I > described requires leaf counts on each arc and these are usually large > integers. Even vcoded they bloat the resulting data structure. Maybe we should iterate on the issue to get down to the specifics? I had thought there wouldn't be any backtracking, if the FST had stored the ord as an output... Mike http://blog.mikemccandless.com --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org