[ 
https://issues.apache.org/jira/browse/SOLR-6122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17892240#comment-17892240
 ] 

Yuntong Qu commented on SOLR-6122:
----------------------------------

Made a POC PR using deleteStatus to remove not-started tasks. Using delete 
status endpoint to forcefully delete not-started tracking. And after cancel, 
the task will not be present in failure/completed map. 
TBH, I don't particular love this solution as it limit us forward to do cancel 
in-progress. But also I want to get some opinions on this. 

--
One of the main problem is have is to deal with OverseerTaskProcesser keeing 
below in-memory data structure  
- runningZKTasks (Set of tasks that have been picked up for processing but not 
cleaned up from zk work-queue)

- blockedTasks (contain tasks which are read from work queue but could not be 
executed because they are blocked or the execution queue is full)

With above 2 data structure, overseer will not have real time view of what's 
happening on ZK queue ( which is an optimization to reduce ZK read ). 

I am working on another way to add cancel task to collection-queue-work and 
a new OverseerMessageHandler to handle cancel task specific (instead of using 
OverseerCollectionMessageHandler), and let that cancel message handler modify 
ZK queue and in-memory tracking for Overseer

--
Re [~gerlowskija] on order of cancel:
- If we send cancel task to _*collection-queue-work,*_ there are still chances 
that the cancel won't be picked up, since in OverseerTaskProcessor we limit num 
of task picked up from the queue, and if we exceed MAX_BLOCKED_TASKS, no new 
tasks will be picked up. And if there many running task exceeding or 
MAX_PARALLEL_TASKS, no new cancel tasks can be started. 

- after a cancel task is being picked up in OverseerTaskProcessor, from my 
reading of the coding, each queue item will spun up another Runner thread to 
handle, so the processing of queued item will be quite fast. 

- To completely elimiate the concern of cancle task not beeing handle ASAP when 
submitted, in my mind, the best approach is to have a another queue to take in 
cancel task requests. 

Trade off here is complexity, but submiting to _*collection-queue-work*_ should 
mostly work. Maybe an improvement to add a new queue if needed



 

> API to cancel an already submitted/running Collections API call
> ---------------------------------------------------------------
>
>                 Key: SOLR-6122
>                 URL: https://issues.apache.org/jira/browse/SOLR-6122
>             Project: Solr
>          Issue Type: Wish
>          Components: SolrCloud
>            Reporter: Anshum Gupta
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Right now we can trigger a long running task with no way to cancel it 
> cleanly. 
> We should have an API that interrupts the already running/submitted 
> collections API call.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to