[GitHub] [arrow] paleolimbot commented on pull request #13773: ARROW-17252: [R] Intermittent valgrind failure

GitBox Tue, 02 Aug 2022 18:13:06 -0700


paleolimbot commented on PR #13773:
URL: https://github.com/apache/arrow/pull/13773#issuecomment-1203372530


   > If this workaround fixes things for a little while then so be it.
   
   I agree that this is a temporary workaround...I don't think anything all 
that insidious is going on that affects normal usage, but I also don't want us 
to get an angry CRAN note about valgrind that attracts attention to anything 
else we're doing.
   
   > I'd really like to know exactly what is going on.
   
   Slight progress there: I tried waiting for the thread pools to finish before 
unloading the package (#13779) and that seems to remove the errors as well, 
although it's a bit of a hack in its own way (getting the IO thread pool is not 
exported in the public headers). That seems consistent with the shutting down 
of the thread pools leaking *something*?
   
   > Could you share some kind of minimal(ish?) reproducer?
   
   From the arrow/r directory, `echo 'devtools::test()' | R --no-save -d 
"valgrind --tool=memcheck --leak-check=full"`. Not very minimal, but an 
improvement over the 5 hours it takes the crossbow job.  You can run fewer test 
files using `devtools::test(filter = "some_regex")` but I was never able to get 
an error using a few obvious filters ("dplyr", for example). I wonder if the CI 
just runs threads really really slowly which is why leaks show up more 
frequently then.
   
   > I'm not entirely sure how this would be related to UDFs.
   
   The error isn't, I don't think, but was exposed by a change that we needed 
to make UDFs work (we need to evaluate the entire exec plan within one 
`RunWithCapturedR()`, which meant calling `reader->ToTable()` instead of 
sending the `RecordBatchReader` to R, then converting it to a table there). 
Before this PR, all exec plan results were routed through a C++-level 
`reader->ToTable()` unless an R-level RecordBatchReader was explicitly 
requested; after this PR, all exec plans go through an R-level 
RecordBatchReader unless somebody actually registers a user-defined function.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] paleolimbot commented on pull request #13773: ARROW-17252: [R] Intermittent valgrind failure

Reply via email to