Re: [I] [R][C++] Repartitioning on a new variable uses all my RAM and crashes [arrow]

via GitHub Mon, 11 Mar 2024 12:20:46 -0700


thisisnic commented on issue #40224:
URL: https://github.com/apache/arrow/issues/40224#issuecomment-1989251336


   > How were you measuring RAM? Were you looking at the RSS of the process? Or 
were you looking at the amount of free/available memory?
   
   I was just looking at free/available memory - would RSS of the process be 
better?
   
   > If it is R specific then maybe R is accumulating everything before the 
call to write_dataset? I seem to remember that being an R fallback at some 
point when creating plans.
   
   Thanks, I'll take a closer look into the code to see if I can find something.
   
   > In python the write_dataset call can take as input a record batch reader. 
I think you actually end up with two acero plans. The first is the one you 
shared and the second is just source -> write (where the first plan's output is 
the "source" node in the second plan).
   
   > However, in R, it might be more natural to make a source -> project -> 
write plan instead of a source -> project -> sink plan in this situation.
   
   Sorry, I'm a bit lost here; what are the implications of write versus sink 
here?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] [R][C++] Repartitioning on a new variable uses all my RAM and crashes [arrow]

Reply via email to