alamb commented on issue #12641:
URL: https://github.com/apache/datafusion/issues/12641#issuecomment-2379097683

   I agree that  inconsistent behavior is not good -- my understanding is that 
when a stream is cancelled / aborted it should stop execution immediately: 
https://docs.rs/datafusion/latest/datafusion/physical_plan/trait.ExecutionPlan.html#cancellation--aborting-execution
 so if you know of instances where that is not the case please let us know and 
I can file tickets to fix them
   
   However, that documentation does not describe the policy of "fast shutdown 
on error" -- I will try and clarify it with a new PR.
   
   I think many systems do want a quick shutdown if an error has occured and 
*NOT* continue to poll other streams. The reason is that if the overall query 
will error anyways, any additional polling is wasted work that would be thrown 
away.
   
   > For example, we may want to get the partial query result when data 
corruption happens in the TableScan. In this case, the error generated by the 
TableScan will be passed through all of the streams. As long as this TableScan 
could recover it self to produce the next RecordBatch, we could get the error 
and the partial query result
   
   The usecase of ignoring certain bad files and continuing reading other files 
makes sense to me for certain systems (though other systems would likely wish 
to abort immediately)
   
   To achieve this usecase, Instead of changing the default "shutdown as 
quickly as possible on error" behavior, here are some other ideas
   1. Change the table scan operator itself (if you are using a custom one) so 
that it does not emit an error to the rest of the plan
   2. Add some new ExecutionPlan operator that will "ignore" errors (as in not 
pass whatever error it gets to the output) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to