On Wed, Nov 9, 2011 at 1:52 PM, Grant Ingersoll <[email protected]> wrote:
> Notice the code in Lucene.  It is solving the exact same problem.  Return the 
> top X things by score.  Our ScoreDoc is made of a score and an id.

I'm sure it does -- but I thought the question was why this code was
the way it is, and why you couldn't just use the queue as-is? I am
sure the Lucene equivalent must do something similar. You can't solve
this with a max-heap, if that's what you mean.


> Yeah, I know.  But why all the shuffling around?  We have all we need on the 
> PQ already.

Yes, but not in order! What are you suggesting would work, that is simpler?


> Multiplied by millions or requests, it adds up.  I think Recommenders, 
> especially, are to the point where it's time to start tightening the screws 
> on performance.   I've been reviewing this stuff quite a bit lately and am 
> struck by how similar this stuff is to Lucene's core scoring, albeit from a 
> few years ago (pre Lucene 2.4, by my guess).  For instance, our "IDRescorer", 
> is modeled via a Collector class and bit set filters and it's integrated 
> directly into the scoring, whereas Mahout sorts after scoring.

This is nowhere near the bottleneck, I assure you, but invite you to
profile it to have a look. This piece of code has been profiled and
scrubbed many times over.

I don't understand the comment about sorting after scoring -- you
can't, in this context.

Reply via email to