Thanks for the reply everyone! I've spent some time looking at tests and source code and I've learned a lot about Lucene's Automata and FST. Way more productive than scanning javadocs. Thanks for the hint.
*> Are you looking for a historical book on Lucene development or are you* *> looking to solve a particular problem?* Dawid, the thing is that I am not even sure that Automata are the perfect fit for my project and I thought some literature on it would help me decide whether to use it or not. Furthermore, I'll probably have to modify Lucene's Automata implementation for my project, and I thought that reading about design choices would allow me to better understand how to improve it. Anyway, I am done with basic usage and whenever I find time to write about what I've learned I'll make sure to share the link here. I'll list some features I'll need to add to Lucene 6 base implementation of Automata and FST. I'd like to hear your thoughts on it, specially if you find any of them particularly a waste of time. 1. I need to be able to add data to an FST (add new strings and update the mapped value in the case of FST). I thought about a multi-layer strategy where old data has been compressed to an FST format whereas new data is added to a delta partition (probably a BST or a simple list). A background process merges delta into the closed FST. The merge process consists of materializing all strings encoded into the FST, merge this list with strings on delta, and then construct a new FST. Probably the merge can be done during the process of enumerating, since the enumeration happens in lexicographical order. 2. I need to have multiple FST loaded and to be able to search them on demand. I thought about modifying the implementation to access data on a memory mapped file instead of a raw in memory byte[] or int[]' s. Juarez On Wed, Apr 19, 2017 at 3:39 AM Dawid Weiss <dawid.we...@gmail.com> wrote: > > One small correction: we moved away from objects to more compact int[] a > > while ago for our automata implementation. > > Right, forgot about that. There are still some trappy object-heavy > utilities like this one: > > > https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/util/automaton/Automaton.java#L127-L129 > > and the API is using objects (Transition) for an 'inout' mutable > holder type which may be confusing at first (but is unavoidable in > Java). > > Dawid > -- Juarez