We should add this (lookup by value, when value is guaranteed to monotonically increase as the key increases) to our core FST APIs? It's generically useful in many places ;) I'll open an issue.
EG this would also enable an FST terms index that supports lookup-by-ord, something VariableGapTermsIndex (this is the one that uses FST for the index) does not support today. David, one thing to remember is trunk has already seen drastic reductions on the RAM required to store DocTerms/Index vs 3.x (something maybe we should backport to 3.x...). The bytes for the terms are now stored as shared byte[] blocks, and the ords/offsets are stored as packed ints, so we no longer have per-String memory& pointer overhead. I describe the gains here: http://blog.mikemccandless.com/2010/07/lucenes-ram-usage-for-searching.html -- though, those gains include RAM reduction from the terms index as well. While FST w/ lookup-by-monotonic-value would work here, I would be worried about the per hit of that representation vs what DocTerms/Index offers today... we should test to see. Of course, for certain apps that perf hit is justified, so probably we should make this an option when populating field cache (ie, in-memory storage option of using an FST vs using packed ints/byte[]). Mike http://blog.mikemccandless.com On Thu, May 19, 2011 at 4:43 AM, Dawid Weiss <dawid.we...@cs.put.poznan.pl> wrote: >> Though, it's possible to do if you associate an additional number with >> each node. (I invented some way, shared it with Mike and forgot.. good >> grief :/) > > It doesn't need to be invented -- it's a known technique. On each arc > you store the number of strings under that arc; while traversing you > accumulate -- this gives you a unique number for each string (perfect > hash) and a way to locate a string given its number. > >> And it can't do the reverse lookup, by design, that's a lossy >> compression for all good perfect hashing algos. >> So, it's irrelevant here, huh? > > You can do both the way I described above. Jan Daciuk has details on > many more variants of doing that: > > Jan Daciuk, Rafael C. Carrasco, Perfect Hashing with Pseudo-minimal > Bottom-up Deterministic Tree Automata, Intelligent Information Systems > XVI, Proceedings of the International IIS'08 Conference held in > Zakopane, Poland, June 16-18, 2008, Mieczysław A. Kłopotek, Adam > Przepiórkowski, Sławomir T. Wierzchoń, Krzysztof Trojanowski (eds.), > Academic Publishing House Exit, Warszawa 2008. > > Dawid > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org