[jira] Commented: (LUCENE-2111) Wrapup flexible indexing

Robert Muir (JIRA) Tue, 30 Mar 2010 09:33:58 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-2111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12851456#action_12851456
 ]


Robert Muir commented on LUCENE-2111:
-------------------------------------

{quote}
There are certain specific wildcard corner cases where we are
slower, but these are likely rarely used in practice (many ?'s
followed by a suffix).
{quote}

I think it would be good to fix this in the future, but I certainly think its a 
rare case.
The problem is similar to where an SQL engine decides to just table-scan instead
of using a btree index... In this case we are trying to be too smart and just 
seek
to the correct term based on the query instead of scanning, but this causes too
many seeks.

At the same time, you have to be careful or you make the wrong decision
and give O\(n\) performance instead of O\(log n\). 

In my opinion it would be better to think in the future how we can improve 
lucene
in the following ways:
* The term dictionary should be more "DFA-friendly", e.g. the whole concept of 
TermsEnum is wrong, 
linear enumeration of terms is inefficient for any big index. we should get 
away from it.
* Instead it would be nice to think of the index like an FST, and instead of 
enumerating things and filtering them,
we provide a DFA and enumerate the transduced results.
* We need to eliminate the UTF-8/UTF-16 impedence mismatch which causes so much
complication and unnecessary hairy code today.

All this being said, I think flex is a great move forward for multitermqueries, 
at least
we have a seeking-friendly API! One step at a time.



> Wrapup flexible indexing
> ------------------------
>
>                 Key: LUCENE-2111
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2111
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Flex Branch
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 3.1
>
>         Attachments: benchUtil.py, flex_backwards_merge_912395.patch, 
> flex_merge_916543.patch, flexBench.py, LUCENE-2111-EmptyTermsEnum.patch, 
> LUCENE-2111-EmptyTermsEnum.patch, LUCENE-2111.patch, LUCENE-2111.patch, 
> LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, 
> LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, 
> LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, 
> LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, 
> LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111_bytesRef.patch, 
> LUCENE-2111_experimental.patch, LUCENE-2111_fuzzy.patch, 
> LUCENE-2111_mtqNull.patch, LUCENE-2111_mtqTest.patch, 
> LUCENE-2111_toString.patch
>
>
> Spinoff from LUCENE-1458.
> The flex branch is in fairly good shape -- all tests pass, initial search 
> performance testing looks good, it survived several visits from the Unicode 
> policeman ;)
> But it still has a number of nocommits, could use some more scrutiny 
> especially on the "emulate old API on flex index" and vice/versa code paths, 
> and still needs some more performance testing.  I'll do these under this 
> issue, and we should open separate issues for other self contained fixes.
> The end is in sight!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2111) Wrapup flexible indexing

Reply via email to