Piotr Kosiorowski wrote:
Hi, I started to think about implementing special kind of Lucene Query (if I remember correctly I would have to write my own Scorer and probably a few other classes) optimized for Nutch some time ago. I assumed having specialized query I would be able to avoid accessing some of lucene index structures multiple times as the same term apears many times in query generated by Nutch for multitoken queries. I am not an Lucene expert but maybe it is worth checking if it might give some performance boost. Has anyone any ideas why it might help or not?
That's a very good comment. Looking at the profile traces I can see that a lot of time is spent just juggling the sub-query scorers inside the BooleanScorer, and handling the complex query structure; if this part could be optimized by the use of a special scorer, it could be a big win.
-- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com
