[jira] Commented: (LUCENE-2694) MTQ rewrite + weight/scorer init should be single pass

Simon Willnauer (JIRA) Sun, 19 Dec 2010 07:48:23 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12973003#action_12973003
 ]


Simon Willnauer commented on LUCENE-2694:
-----------------------------------------

{quote}
I think we should remove TermsEnum.docFreq and .ord? Ie replace
with .termState().docFreq() and .ord()?
{quote}

I disagree on that - at least docFreq() is an essential part of the API and we 
should not force TermState creation just to get the df. Yet, TermState is an 
expert API you should not need to pull an expert API to get something essential 
like df.
I would leave those as they are or only pull ord into TermState.

{quote}
Maybe rename TermStateBase -> PrefixCodedTermState? Ie this is
really the TermState impl used by any codec using
PrefixCodedTerms? EG the fact that it stores the filePointer into
a _X.tis file is particular to it..
{quote}
Yeah that sounds reasonable.

{quote}
Maybe rename MockTermState -> BasicTermState? At first I was
thinking the codec should return null if it cannot seek by
TermState... (I generally don't like mock returns that hide/lose
information...) but then it's convenient to always have something
to hold the docFreq for the term to avoid lots of special cased
code... so I think it's OK?
{quote}

I think we can get rid of it entirely. We can use TermStateBase for it and let 
PrefixCodedTermState just add the pointer though. That way we get rid of it 
nicely. I would like to keep that api as it is since it makes the usage easier 
especially in the rewrite methods..

bq. We lost the "clone using new" in StandardTermState...
I don't get that really - IMO this is quite minor but I will look into it 
again... 

{quote}
Maybe revert changes to AppendingCodec? (Ie let it pass its terms
dict cache size again)
{quote}

unintentional - will fix 


{quote}
 wonder if we can somehow make PerReaderTermState use an array
(keyed by sub reader index) instead... seems like a new HashMap
per Term in an MTQ could be heavy. It's tricky because we don't
store enough information (ie to quickly map parent reader + sub
reader -> sub index). But I don't think this should hold up
committing... since our defaults don't typically allow for that
many terms in-flight it should be fine...
{quote}

I actually had this in a very similar way. I used a custom linked list and 
relied on the fact that the incoming reader are applied in the same order and 
skipped until the next reader with that term appeared. I changed that back to 
Map impl to make it simpler since I didn't see speedups - well this was caused 
by a very nifty coding error :D 

i think I have that patch around somewhere is the history... lets see..

bq. I think the TQ ctor that takes both docFreq and states can drop the 
docFreq? Ie it can ask the states for it?

yeah sure - well the patch is my current state since I had to drop everything 
and leave on friday... I clean up an upload a new patch early this week

@Uwe: I will incorporate your fix - thanks




> MTQ rewrite + weight/scorer init should be single pass
> ------------------------------------------------------
>
>                 Key: LUCENE-2694
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2694
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 4.0
>
>         Attachments: LUCENE-2694-FTE.patch, LUCENE-2694.patch, 
> LUCENE-2694.patch, LUCENE-2694.patch, LUCENE-2694.patch
>
>
> Spinoff of LUCENE-2690 (see the hacked patch on that issue)...
> Once we fix MTQ rewrite to be per-segment, we should take it further and make 
> weight/scorer init also run in the same single pass as rewrite.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-2694) MTQ rewrite + weight/scorer init should be single pass

Reply via email to