[ 
https://issues.apache.org/jira/browse/LUCENE-2111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12851511#action_12851511
 ] 

Michael McCandless commented on LUCENE-2111:
--------------------------------------------

{quote}
The term dictionary should be more "DFA-friendly", e.g. the whole concept of 
TermsEnum is wrong,
linear enumeration of terms is inefficient for any big index. we should get 
away from it.
Instead it would be nice to think of the index like an FST, and instead of 
enumerating things and filtering them,
we provide a DFA and enumerate the transduced results.
We need to eliminate the UTF-8/UTF-16 impedence mismatch which causes so much
complication and unnecessary hairy code today.
{quote}

+1 -- we already see these limitations now in making AutomatonQuery consume the 
straight enum.  If we flipped the problem around (you pass a DFA to the codec 
and it does the intersection & enums the result), and we used byte-based DFAs, 
I think we'd get a good speedup.

> Wrapup flexible indexing
> ------------------------
>
>                 Key: LUCENE-2111
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2111
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Flex Branch
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 3.1
>
>         Attachments: benchUtil.py, flex_backwards_merge_912395.patch, 
> flex_merge_916543.patch, flexBench.py, LUCENE-2111-EmptyTermsEnum.patch, 
> LUCENE-2111-EmptyTermsEnum.patch, LUCENE-2111.patch, LUCENE-2111.patch, 
> LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, 
> LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, 
> LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, 
> LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111.patch, 
> LUCENE-2111.patch, LUCENE-2111.patch, LUCENE-2111_bytesRef.patch, 
> LUCENE-2111_experimental.patch, LUCENE-2111_fuzzy.patch, 
> LUCENE-2111_mtqNull.patch, LUCENE-2111_mtqTest.patch, 
> LUCENE-2111_toString.patch
>
>
> Spinoff from LUCENE-1458.
> The flex branch is in fairly good shape -- all tests pass, initial search 
> performance testing looks good, it survived several visits from the Unicode 
> policeman ;)
> But it still has a number of nocommits, could use some more scrutiny 
> especially on the "emulate old API on flex index" and vice/versa code paths, 
> and still needs some more performance testing.  I'll do these under this 
> issue, and we should open separate issues for other self contained fixes.
> The end is in sight!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to