[
https://issues.apache.org/jira/browse/JENA-29?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12994626#comment-12994626
]
Simon Helsen commented on JENA-29:
----------------------------------
Andy, I think I already explained some of your questions in the description.
Let me address them here again:
1) timeouts: This is the number one use-case and we (in our own framework)
achieve this by having one thread monitor all incoming query executions. We use
it for other things like advanced scheduling. Personally, I think it is better
to leave this to the clients instead of trying to jam it into
QueryIteratorBase. The key is that a client is able to invoke cancel() from a
secondary thread themselves and get partial results without exceptions.
2) As for the API, we abuse 206 because we feel 200 is wrong since it deceives
the client in believing the results may be complete, yet, they are not - or
rather - once a cancel() has been invoked, it cannot be know if the results
will be complete. Note that my patch is specifically designed so that the query
completes normally, except that it will only show potentially partial results
3) this leads me to the mechanism about requestCancel versus cancel. This is
critical actually! The requestCancel() can be invoked at any moment by a
parallel thread. This is not the case for cancel (!), nor abort. If you
randomly invoke cancel or abort, you *will* get exceptions of all sorts and
kinds, NPEs, concurrent mod exceptions, things like that. The reason is simple:
if you immediately cancel() (and thus make hasNext() return false), you break
the invariant that hasNext()==true always allows next() to work fine:
t1 t2
hasNext==true ....
... cancel()
next() => breaks ....
t1's next() should not succeed anymore when cancel() has been invoked. So, to
alleviate this, cancel() really only requests a cancellation instead of
enforcing it so that all iterators can allign before cancelling the execution.
I have tests which show that this breaks without requestCancel()
4) QueryIterSort and QueryIterGroup: again, quite critical to make partial
results work. They way the patch is constructed, it is possible for a sorting
query to cancel somewhere during execution, yet, still return the *partial*
results sorted. I had to refactor the execution to make sure this keeps
working. I have tests which proof this. Before my changes, a sorting query
would only ever return 1 result when cancelled, now, it returns as many results
as the iterator was able to collect, but before returning the partial results,
it sorts them. Similar for groups
5) abort(): I would request to keep abort. Abort, unlike cancel(), is not
threadsafe, i.e. it will cause exceptions. However, it is more aggressive and
therefore the right thing when you just want to quickly abort an execution
without caring about partial results. We have important use-cases for that. In
those scenarios, we catch and swallow any exceptions around the abort() call as
well as the execute call (in the query thread) itself.
6) buffering on a web server. It is correct that we had to buffer results in
order to be able to change the status code, but it seems that in practice, this
is behaving well for us
Andy, the patch as I provided it makes sure the iterator semantics is in tact
and truely partial (but correct) results can be computed. As you wrote above as
well, it is best effort, meaning that cancel() is not immediate, but tries to
gracefully bring the computation to an end so that a client can still process
the found results. That is why the cancel() as I provided it in the patch, does
not throw any exception. It gracefully finishes computation and this is
critical for us!
I hope you understand what I tried to do and why the patch is doing exactly
what we need. If you decide to deviate, please explain carefully as we may be
forced to patch any adoptions in those cases
thanks,
Simon
> cancellation during query execution
> -----------------------------------
>
> Key: JENA-29
> URL: https://issues.apache.org/jira/browse/JENA-29
> Project: Jena
> Issue Type: Improvement
> Components: ARQ, TDB
> Reporter: Simon Helsen
> Assignee: Andy Seaborne
> Attachments: JENA-29_ARQ_r8489.patch, JENA-29_TDB_r8489.patch,
> JENA-29_tests_ARQ_r8489.patch, jena.patch, jenaAddition.patch
>
>
> The requested improvement and proposed patch is made by Simon Helsen on
> behalf of IBM
> ARQ query execution currently does not have a satisfactory way to cancel a
> running query in a safe way. Moreover, cancel (unlike a hard abort) is
> especially useful if it is able to provide partial result sets (i.e. all the
> results it managed to compute up to when the cancellation was requested).
> Although the exact cancellation behavior depends on the capabilities of the
> underlying triple store, the proposed patch merely relies on the iterators
> used by ARQ.
> Here is a more detailed explanation of the proposed changes:
> 1) the cancel() method in the QueryIterator initiates a cancellation request
> (first boolean flag). In analogy with closeIterator(), it propagates through
> all chained iterators, so the entire calculation is aware that a cancellation
> is requested
> 2) to ensure a thread-safe semantics, the cancelRequest becomes a real cancel
> once nextBinding() has been called. It sets the second boolean which is used
> in hasNext(). This 2-phase approach is critical since the cancel() method can
> be called at any time during a query execution by the external thread. And
> because the behavior of hasNext() is such that it has to return the *same*
> value until next() is called, this is the only way to guarantee semantic
> safety when cancel() is invoked (let me re-phrase this: it is the only way I
> was able to make it actually work)
> 3) cancel() does not close anything since it allows execution to finish
> normally and the client is responsible to call close() just like with a
> regular execution. Note that the client has to call cancel() explicitly
> (typically in another thread) and has to assume that the returning result set
> may be incomplete if this method is called (it is undetermined whether the
> result is _actually_ incomplete)
> 4) in order to deal with order-by and groups, I had to make two more changes.
> First, I had to make QueryIterSort and QueryIterGroup a slightly bit more
> lazy. Currently, the full result set is calculated during plan calculation.
> With my proposed adjustments, this full result set is called on the first
> call to any of its Iterator methods (e.g. hasNext). This change does not
> AFAIK affect the semantics. Second, because the desired behavior of
> cancelling a sort or group query is to make sure everything is sorted/grouped
> even if the total result set is not completed, I added an exception which
> reverses the cancellation request of the encompassing iterator (as an example
> see cancel() in QueryIterSort). This makes sure that the entire subset of
> found and sorted elements is returned, not just the first element. However,
> it also implies in the case of sort that when a query is cancelled, it will
> first sort the partially complete result set before returning to the client.
> the attached patch is based on ARQ 2.8.5 (and a few classes in TDB 0.8.7 ->
> possibly the other triple store implementations need adjustement as well)
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira