[
https://issues.apache.org/jira/browse/ARROW-11841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17567349#comment-17567349
]
Dewey Dunnington commented on ARROW-11841:
------------------------------------------
It's a little hard to test because it involves seeing how fast you can press
Control-C, but I'm pretty sure that the sending an interrupt signal to CSV
reading and an exec plan doesn't do anything:
{code:R}
library(arrow, warn.conflicts = FALSE)
#> Some features are not enabled in this build of Arrow. Run `arrow_info()` for
more information.
tf <- tempfile()
readr::write_csv(vctrs::vec_rep(mtcars, 5e5), tf)
# try to slow down CSV reading
set_cpu_count(1)
set_io_thread_count(2)
# compare timing of cancelled vs not cancelled (hard to tell the difference)
system.time(read_csv_arrow(tf))
#> user system elapsed
#> 2.852 0.637 5.365
system.time(open_dataset(tf, format = "csv") |> dplyr::collect())
#> user system elapsed
#> 2.920 0.219 3.049
# compare responsiveness of cancelling the read using other APIs
# (usually quite a difference)
system.time(readr::read_csv(tf))
#> Rows: 16000000 Columns: 11
#> ── Column specification
────────────────────────────────────────────────────────
#> Delimiter: ","
#> dbl (11): mpg, cyl, disp, hp, drat, wt, qsec, vs, am, gear, carb
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this
message.
#> user system elapsed
#> 19.424 1.267 3.496
system.time(read.csv(tf))
#> user system elapsed
#> 20.858 0.718 21.864
{code}
It seems like we would need some sort of "run this bit of code in XX seconds"
to implement this in the R bindings (or if there's an easier way that would be
great!). It doesn't matter what thread it's on because {{SafeCallIntoR}}
handles that...I *think* I know how to do that (start a thread, make it sleep
for some number of seconds, then call SafeCallIntoR). The setup/cleanup could
live in {{RunWithCapturedR}}?
> [R][C++] Allow cancelling long-running commands
> -----------------------------------------------
>
> Key: ARROW-11841
> URL: https://issues.apache.org/jira/browse/ARROW-11841
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++, R
> Reporter: Antoine Pitrou
> Priority: Major
> Fix For: 10.0.0
>
>
> When calling a long-running task (for example reading a CSV file) from the R
> prompt, users may want to interrupt with Ctrl-C.
> Allowing this will require integrating R's user interruption facility with
> the cancellation API that's going to be exposed in C++ (see ARROW-8732).
> Below some information I've gathered on the topic:
> There is some hairy discussion of how to interrupt C++ code from R at
> https://stackoverflow.com/questions/40563522/r-how-to-write-interruptible-c-function-and-recover-partial-results
> and https://stat.ethz.ch/pipermail/r-devel/2011-April/060714.html .
> It seems it may involve polling cpp11::check_user_interrupt() and catching
> any cpp11::unwind_exception that may signal an interruption. A complication
> is that apparently R APIs should only be called from the main thread. There's
> also a small library which claims to make writing all this easier:
> https://github.com/tnagler/RcppThread/blob/master/inst/include/RcppThread/RMonitor.hpp
> But since user interruptions will only be noticed by the R main thread, the
> solution may be to launch heavy computations (e.g. CSV reading) in a separate
> thread and have the main R thread periodically poll for interrupts while
> waiting for the separate thread. This is what this dedicated thread class
> does in its join method:
> https://github.com/tnagler/RcppThread/blob/master/inst/include/RcppThread/Thread.hpp#L79
--
This message was sent by Atlassian Jira
(v8.20.10#820010)