It looks that Levenshtein Automaton was introduced in the new version of lucene, earlier it used to be brute force approach.
1) How about the prefix queries handled ? 2) In general, there is sorted term list which is mapped to the doc-ids in which the corresponding terms occurred (inverted index). Does lucene has any data structure to store this term-list used for efficient search, as storing in some for of balanced binary search tree or tries need serialising and de-serialising every time it is accessed or needed, which is very expensive task, as it needs complete scan of all data. On Tue, Nov 27, 2012 at 2:50 PM, Federico Méndez <federic...@gmail.com>wrote: > As an introduction you can read this wonderful article: > http://java.dzone.com/news/lucenes-fuzzyquery-100-times > > > On Tue, Nov 27, 2012 at 10:08 AM, sri krishna <krishnai...@gmail.com>wrote: > >> >> How does lucene handle the wildcard and fuzzy queries internally? >> >> It looks like data stored as term->posting list. In fact what data >> structures to generate efficient results? >> >> If it is using compressed trie, how does it handle the segments merging >> efficiently ?. If it is using just a linear scan to find the words in >> query, how does prefix based terms are found ?. Can anyone give much more >> explained details on such advanced queries handled in lucene from >> -efficiency point of view. >> >> >> Thanks > > >