[ 
https://issues.apache.org/jira/browse/ARROW-17173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17607714#comment-17607714
 ] 

David Li commented on ARROW-17173:
----------------------------------

I don't think you can avoid plumbing around it, either explicitly as we have, 
or implicitly by maintaining some sort of thread or task-local state (which we 
have to be careful to propagate/save/restore). I like this series of posts 
which discusses the same issue in Python: 
https://vorpus.org/blog/timeouts-and-cancellation-for-humans/ but there's no 
easy answer there (Trio does a lot of work to maintain the task-local state to 
make cancellation work).

I also agree that at least for the immediate issue, filesystems shouldn't be 
getting StopToken from the IOContext, but rather individual operations should 
have a StopToken.

> [C++] Clarify lifecycle of a StopSource/StopToken
> -------------------------------------------------
>
>                 Key: ARROW-17173
>                 URL: https://issues.apache.org/jira/browse/ARROW-17173
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Dewey Dunnington
>            Priority: Major
>
> In ARROW-11841 we ran into an issue where a single cancellable operation 
> (i.e., {{SetSignalStopSource()}}/{{ResetSignalStopSource()}} was a poor fit: 
> the {{StopToken}} must be assigned to an {{IOContext}} when a filesystem is 
> created; however, the filesystem may be reused for more than one cancellable 
> operation (e.g., reading a CSV). Following the instructions in the current 
> API (in util/cancel.h) results in a situation the lifecycle of the filesystem 
> must match the lifecycle of the {{StopSource}}, which can be difficult to 
> program around.
> A related problem is that where we load Python and R Arrow libraries that 
> link to the same .so. After ARROW-11841, R will have the ability to register 
> signal handlers to interrupt Arrow operations, and users that load pyarrow 
> via reticulate must be careful to disable it or they will get an error along 
> the lines of "StopSource already set up".
> From a purely R-centric point of view, we could provide our own {{StopToken}} 
> implementation if we were allowed to since R already implements the proper 
> signal handler and the arrow R package implements the proper event loop to 
> make this thread safe. Currently the {{StopToken}} is passed by value and 
> thus a subclass is not an option. For R, anyway, this would eliminate any 
> need to consider the lifecycle of another object.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to