Thanks for sharing, John; this looks interesting!  Would be nice to build a
codec using that reference impl (hmm, except it is C++)

FST in this context stands for "Fast Succinct Trie" not "Finite State
Transducer" and it seems to be a data structure similar to (but claimed
better than) bloom filters, in that it can efficiently evaluate set
membership with a tunable chance for being wrong (false positive), but it'd
be nice to see some numbers on a real Lucene index / use case.

Mike McCandless

http://blog.mikemccandless.com

On Sat, Jun 23, 2018 at 6:04 PM, John Wang <[email protected]> wrote:

> The SigMod paper describes a more compact FST implementation looks really
> interesting:
>
>  http://www.cs.cmu.edu/~huanche1/publications/surf_paper.pdf
>
>  (reference implementation: https://github.com/efficient/SuRF)
>
> Was wondering if Lucene's FST implementation used by term dictionary can
> take advantage of this.
>
> Thanks
>
> -John
>

Reply via email to