On Sun, Mar 9, 2014 at 9:55 AM, Da Huang <dhuang...@gmail.com> wrote:
> Hi, Mike.
>
> You're right. After having a look at the comments on LUCENE-1518, I find
> that my idea about that has many bugs. Sorry for that.

It's fine, it's a VERY hard fix :)  This is why it hasn't been done yet!

> Thus, I have checked some other suggestions you gave me to see whether
> relevant comments can be found in jira.
>
> I think I have some idea on "LUCENE-4396: BooleanScorer should sometimes be
> used for MUST clauses".
> Can we adjust the query to make the problem easier? For the query "+a b c +e
> +f" as an example, maybe we can
> turn it into "(+a +e +f) b c" which has only one MUST clause. Then, it would
> be easier to judge which scorer to use?

You mean create nesting when there wasn't before, by grouping all MUST
clauses together?  We could explore that ...

Or we could pass all the clauses (still flat) to BooleanScorer.  I
think this would only be faster when the MUST clauses are high cost
relative to all other clauses.  E.g. a super-rare MUST'd clause would
probably be faster with BooleanScorer2.

I think this could make a good GSoC project.

> Besides, I seems that the suggestion "we should pass a needsScorers boolean
> up-front to Weight.scorer"
> is not on jira. But it sounds that it can be done by adjusting some class
> methods' arguments and return value
> to pass the "needsScorers"? not sure.

I think it's this Jira: https://issues.apache.org/jira/browse/LUCENE-3331

(I just searched for "needs scores" on
http://jirasearch.mikemccandless.com and it was one of the
suggestions).

All that should be needed here is to add a "boolean needsScores" (or
something) to the Weight.scorer method, and fix the numerous places
where this method is invoked to pass the right value.  E.g.
ConstantScoreQuery would pass false, and this would mean e.g. if it
wraps a TermQuery, we could avoid decoding freq blocks from the
postings.

> At last, recently I find something strange in the code about heap. I find
> heap has been implemented duplicately
> for many times in the trunk, and a PriorityQueue is also implemented in the
> package org.apache.lucene.util.
> I remember java has already implemented the PriorityQueue. Why not use that?

Good question!  There is a fair amount of duplicated code, and we
should fix that over time.  Lucene has had its own PQ class forever,
and we do strange things like pre-filling the queue with a sentinel
value to avoid "if (queueIsNotFullYet)" checks in collect(int doc),
and we can replace the top value and re-heap ... but maybe these do
not in fact matter in practice and if so we should stop duplicating
code :)

Mike McCandless

http://blog.mikemccandless.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to