Thanks for sharing, John; this looks interesting! Would be nice to build a codec using that reference impl (hmm, except it is C++)
FST in this context stands for "Fast Succinct Trie" not "Finite State Transducer" and it seems to be a data structure similar to (but claimed better than) bloom filters, in that it can efficiently evaluate set membership with a tunable chance for being wrong (false positive), but it'd be nice to see some numbers on a real Lucene index / use case. Mike McCandless http://blog.mikemccandless.com On Sat, Jun 23, 2018 at 6:04 PM, John Wang <[email protected]> wrote: > The SigMod paper describes a more compact FST implementation looks really > interesting: > > http://www.cs.cmu.edu/~huanche1/publications/surf_paper.pdf > > (reference implementation: https://github.com/efficient/SuRF) > > Was wondering if Lucene's FST implementation used by term dictionary can > take advantage of this. > > Thanks > > -John >
