It looks that Levenshtein Automaton was introduced in the new version of
lucene, earlier it used to be brute force approach.

1) How about the prefix queries handled ?

2) In general, there is sorted term list which is mapped to the doc-ids in
which the corresponding terms occurred (inverted index). Does lucene has
any data structure to store this term-list used for efficient search, as
storing in some for of balanced binary search tree or tries need
serialising and  de-serialising every time it is accessed or needed, which
is very expensive task, as it needs complete scan of all data.


On Tue, Nov 27, 2012 at 2:50 PM, Federico Méndez <federic...@gmail.com>wrote:

> As an introduction you can read this wonderful article:
> http://java.dzone.com/news/lucenes-fuzzyquery-100-times
>
>
> On Tue, Nov 27, 2012 at 10:08 AM, sri krishna <krishnai...@gmail.com>wrote:
>
>>
>> How does lucene handle the wildcard and fuzzy queries internally?
>>
>> It looks like data stored as term->posting list. In fact what data
>> structures to generate efficient results?
>>
>> If it is using compressed trie, how does it handle the segments merging
>> efficiently ?. If it is using just a linear scan to find the words in
>> query, how does prefix based terms are found ?.  Can anyone give much more
>> explained details on such advanced queries handled in lucene from
>> -efficiency point of view.
>>
>>
>> Thanks
>
>
>

Reply via email to