paleolimbot commented on PR #13773:
URL: https://github.com/apache/arrow/pull/13773#issuecomment-1203372530
> If this workaround fixes things for a little while then so be it.
I agree that this is a temporary workaround...I don't think anything all
that insidious is going on that affects normal usage, but I also don't want us
to get an angry CRAN note about valgrind that attracts attention to anything
else we're doing.
> I'd really like to know exactly what is going on.
Slight progress there: I tried waiting for the thread pools to finish before
unloading the package (#13779) and that seems to remove the errors as well,
although it's a bit of a hack in its own way (getting the IO thread pool is not
exported in the public headers). That seems consistent with the shutting down
of the thread pools leaking *something*?
> Could you share some kind of minimal(ish?) reproducer?
From the arrow/r directory, `echo 'devtools::test()' | R --no-save -d
"valgrind --tool=memcheck --leak-check=full"`. Not very minimal, but an
improvement over the 5 hours it takes the crossbow job. You can run fewer test
files using `devtools::test(filter = "some_regex")` but I was never able to get
an error using a few obvious filters ("dplyr", for example). I wonder if the CI
just runs threads really really slowly which is why leaks show up more
frequently then.
> I'm not entirely sure how this would be related to UDFs.
The error isn't, I don't think, but was exposed by a change that we needed
to make UDFs work (we need to evaluate the entire exec plan within one
`RunWithCapturedR()`, which meant calling `reader->ToTable()` instead of
sending the `RecordBatchReader` to R, then converting it to a table there).
Before this PR, all exec plan results were routed through a C++-level
`reader->ToTable()` unless an R-level RecordBatchReader was explicitly
requested; after this PR, all exec plans go through an R-level
RecordBatchReader unless somebody actually registers a user-defined function.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]