> Preface: I dont know how automaton is implemented deeply inside lucene ,
Well, you can take a look, it's open source. :) There are two different finite state automata inside Lucene: one is pretty much a "read-only" transducer from unique input seqences (of bytes) into an output. This is the FST<?> class. The other is Automaton class which has been ported from the Brics library [1]. I can't really relate to your comment about fast querying for sub-automata; sounds interesting though. Dig in the code and suggest a patch (or even demonstrate what you came up with!). Dawid [1] http://www.brics.dk/automaton/ > but (considering automaton is built on the fly when index is already > present) i imagine that the automaton is scanning the lexicons/tokens > present in the lucene index for finding the document references (solution > 1). > I think there are 2 different generic solutions for using automata for my > opinion. > 1) to create a automaton for parsing the token present in the lucene table > as described above. > 2) to create a pattern matching automaton(on binary, or better of a > abstract stream could be more generic) and put these states directly in a > index . In this case you can receive very fastly the documents matching a > specific automaton built when you created the index ( or a sub-automaton > rappreenting a subset of the same states) . The second solution could > maybe be used for mapping inside a single lucene document field a complex > structure and then you can find nested information embedded . In this way > i need not to use multiple lucene documents (this could create performance > and scalability problems) > In many cases this solution could be fastest of actual joins for example, > be usefull in bioinformatic or all those cases where data is not a basic > ADT. > > Cristian > > 2017-09-30 12:24 GMT+02:00 Dawid Weiss <dawid.we...@gmail.com>: > >> > Hi , it is possible to create a Automaton in lucene parsing not a string >> > but a byte array? >> >> Can you state what problem are you trying to solve? This seems to be a >> question stripped of a more general context -- why do you need those >> byte-based automata? >> >> Dawid >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org