[
https://issues.apache.org/jira/browse/LUCENE-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Simon Willnauer updated LUCENE-2494:
------------------------------------
Attachment: LUCENE-2494.patch
I refactored PMS again and incorporated your idears Edward. I figured that this
patch makes a lot of code obsolet so I removed all the foreach stuff and the
Function interface. I wrapped the CompletionService in a util class to iterate
over is in a for loop which made the code way cleaner, more readable and
shrinks the class a fair bit.
> Modify ParallelMultiSearcher to use a CompletionService instead of slowly
> polling for results
> ---------------------------------------------------------------------------------------------
>
> Key: LUCENE-2494
> URL: https://issues.apache.org/jira/browse/LUCENE-2494
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Search
> Environment: Irrelevant
> Reporter: Edward Drapkin
> Assignee: Simon Willnauer
> Fix For: 3.1
>
> Attachments: LUCENE-2494.patch, LUCENE-2494.patch
>
> Original Estimate: 0h
> Remaining Estimate: 0h
>
> Right now, the parallel multi searcher creates an array/list of Future<V>
> representing each of the searchables that's being concurrently searched (and
> its corresponding search task).
> As it stands, once the tasks are all submitted to the executor, the array is
> iterated over, FIFO, and Future.get() is called iteratively. This obviously
> works, but isn't ideal. It's entirely possible (a situation I've run into)
> where one of the first searchables represents a large index that takes a long
> time to search, so the results of the other searchables can't be processed
> until the large index is done searching. In my case, we have two indexes
> with several million records that get searched in front of some other
> indexes, the smallest of which has only a few ten thousand entries and I
> didn't think it was ideal for the results of the other indexes to wait.
> I've modified ParallelMultiSearcher to use CompletionServices instead, so
> that results are processed in the order they are completed, rather than the
> order that they are submitted. All the tests still pass, and to the best of
> my knowledge this won't break anything. This have several advantages:
> 1) Speed - the thread owning the executor doesn't have to wait for the first
> submitted task to finish in order to process the results of the other tasks,
> which may have finished first
> 2) Removed several warnings (even if they are annotated away) due to the
> ugliness of typecasting generic arrays.
> 3) Decreased the complexity of the code in some cases, usually by removing
> the necessity of allocating and filling arrays.
> With a primed "cache" of searchables, I was getting 700-1200 ms per search,
> and using the same phrases, with this patch, I am now getting 400-500ms per
> search :)
> Patch is attached.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]