Re: Improving TimeLimitedCollector

Mark Harwood Sat, 27 Jun 2009 04:24:56 -0700

Thanks for the feedback, Shai.

So I guess you're suggesting breaking this out into a general utilityclass e.g. something like:


class TimeLimitedThreadActivity
{
        //called by client

public static void startTimeLimitedActivity(longmaxTimePermitted).

        public static void endTimeLimitedActivity()

//called by resources (reader/writers) that need to be sharedfairly by threadspublic static void checkActivityNotElapsed(); //throws someform of runtime exception

A downside of breaking it out into static methods like this is that athread cannot run >1 time-limited activity simultaneously but I guessthat might be a reasonable restriction.

>>Aside, how about using a PQ for the threads' times, or a TreeMap.That will save looping over the collection to find the next candidate.Just an implementation detail though.

Yep, that was one of the rough edges - I just wanted to get rawtimings first for the all the "is timed out?" checks we're injectinginto reader calls.


Cheers
Mark


On 27 Jun 2009, at 11:37, Shai Erera wrote:

I like the overall approach. However it's very local to anIndexReader. I.e., if someone wanted to limit other operations (sayindexing), or does not use an IndexReader (for a Scorer impl maybe),one cannot reuse it.
What if we factor out the timeout logic to a Timeout class (I thinkit can be a static class, with the way you implemented it) and useit in TimeLimitingIndexReader? That class can offer a method check()which will do the internal logic (the 'if' check and throwexception). It will be similar to the current ensureOpen() followedby an operation.
It might be considered more expensive since it won't check aboolean, but instead call a check() method, but it will be morereusable. Also, ensureOpen today is also a method call, so I don'tthink Timeout.check() is that bad. We can even later create aTimeLimitingIndexWriter and document Timeout class for other usageby external code.
Aside, how about using a PQ for the threads' times, or a TreeMap.That will save looping over the collection to find the nextcandidate. Just an implementation detail though.
Shai
On Sat, Jun 27, 2009 at 3:31 AM, Mark Harwood<[email protected]> wrote:Going back to my post re TimeLimitedIndexReaders - here's anincomplete but functional prototype:
http://www.inperspective.com/lucene/TimeLimitedIndexReader.java
http://www.inperspective.com/lucene/TestTimeLimitedIndexReader.java
The principle is that all reader accesses check a volatile variableindicating something may have timed out (no need to check threadlocals etc.) If and only if a time out has been noted threadlocalsare checked to see which thread should throw a timeout exception.
All time-limited use of reader must be wrapped in try...finallycalls to indicate the start and stop of a timed set of activities. Abackground thread maintains the next anticipated timeout deadlineand simply waits until this is reached or the list of plannedactivities changes with new deadlines.
Performance seems reasonable on my Wikipedia index:

//some tests for heavy use of termenum/term docs
Read term docs for 200000 terms in 4755 ms using no timeout limit(warm up)Read term docs for 200000 terms in 4320 ms using no timeout limit(warm up)
Read term docs for 200000 terms  in 4320 ms using no timeout limit
Read term docs for 200000 terms in 4388 ms using reader with time-limited access
//Example query with heavy use of termEnum/termDocs
+text:f* +text:a* +text:b* no time limit matched 1090041 docs in2000 ms+text:f* +text:a* +text:b* time limited collector matched 1090041docs in 1963 ms+text:f* +text:a* +text:b* time limited reader matched 1090041 docsin 2121 ms
//Example fuzzy match burning CPU reading TermEnum
text:accomodation~0.5 no time limit matched 192084 docs in      6428 ms
text:accomodation~0.5 time limited collector matched 192084 docs in5923 mstext:accomodation~0.5 time limited reader matched 192084 docs in5945 ms
The reader approach to limiting time is slower but has theseadvantages :
1) Multiple reader activities can be time-limited rather than justsingle searches
2) No code changes required to scorers/queries/filters etc
3) Tasks that spend plenty of time burning CPU before collectionhappens can be killed earlier
I'm sure there's some thread safety issues to work through in mycode and not all reader classes are wrapped (e.g. TermPositions) butthe basics are there and seem to be functioning
Thoughts?

Re: Improving TimeLimitedCollector

Reply via email to