[ https://issues.apache.org/jira/browse/LUCENE-2111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12851456#action_12851456 ]
Robert Muir commented on LUCENE-2111: ------------------------------------- {quote} There are certain specific wildcard corner cases where we are slower, but these are likely rarely used in practice (many ?'s followed by a suffix). {quote} I think it would be good to fix this in the future, but I certainly think its a rare case. The problem is similar to where an SQL engine decides to just table-scan instead of using a btree index... In this case we are trying to be too smart and just seek to the correct term based on the query instead of scanning, but this causes too many seeks. At the same time, you have to be careful or you make the wrong decision and give O\(n\) performance instead of O\(log n\). In my opinion it would be better to think in the future how we can improve lucene in the following ways: * The term dictionary should be more "DFA-friendly", e.g. the whole concept of TermsEnum is wrong, linear enumeration of terms is inefficient for any big index. we should get away from it. * Instead it would be nice to think of the index like an FST, and instead of enumerating things and filtering them, we provide a DFA and enumerate the transduced results. * We need to eliminate the UTF-8/UTF-16 impedence mismatch which causes so much complication and unnecessary hairy code today. All this being said, I think flex is a great move forward for multitermqueries, at least we have a seeking-friendly API! One step at a time. > Wrapup flexible indexing > ------------------------ > > Key: LUCENE-2111 > URL: https://issues.apache.org/jira/browse/LUCENE-2111 > Project: Lucene - Java > Issue Type: Improvement > Components: Index > Affects Versions: Flex Branch > Reporter: Michael McCandless > Assignee: Michael McCandless > Fix For: 3.1 > > Attachments: benchUtil.py, flex_backwards_merge_912395.patch, > flex_merge_916543.patch, flexBench.py, LUCENE-2111-EmptyTermsEnum.patch, > LUCENE-2111-EmptyTermsEnum.patch, LUCENE-2111.patch, LUCENE-2111.patch, > LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, > LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, > LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, > LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, > LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111_bytesRef.patch, > LUCENE-2111_experimental.patch, LUCENE-2111_fuzzy.patch, > LUCENE-2111_mtqNull.patch, LUCENE-2111_mtqTest.patch, > LUCENE-2111_toString.patch > > > Spinoff from LUCENE-1458. > The flex branch is in fairly good shape -- all tests pass, initial search > performance testing looks good, it survived several visits from the Unicode > policeman ;) > But it still has a number of nocommits, could use some more scrutiny > especially on the "emulate old API on flex index" and vice/versa code paths, > and still needs some more performance testing. I'll do these under this > issue, and we should open separate issues for other self contained fixes. > The end is in sight! -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org