Re: [PR] [GLUTEN-4241][VL] Add plan node to convert Vanilla spark columnar format data to Velox columnar format data [incubator-gluten]

via GitHub Thu, 07 Mar 2024 15:33:49 -0800


zhztheplayer commented on PR #4818:
URL: 
https://github.com/apache/incubator-gluten/pull/4818#issuecomment-1984779010


   > @zhztheplayer can you check how the memory is allocated during the 
conversion? Where the arrow memory is allocated? how many memcpy during the 
conversion? Is there onheap=>offheap copy?
   
   @boneanxs If you'd like to address the questions also, thanks.
   
   I believe the patch reused our old `ArrowWritableColumnarVector` code to 
write Spark columnar data to native so there should be a bunch of "onheap => 
offheap" copies. And we should count on how much of copies the implementation 
exactly does ideally. @boneanxs You can also check on this part.
   
   What I was worried about is `ArrowWritableColumnarVector` have not actually 
been under active maintenance for a period of time so we should have more tests 
here especially for complex data types.
   
   Also would be great if you could share thoughts about the risk of memory 
leaks this approach may bring. Overall the PR's writing looks find to me and we 
had removed most of the unsafe APIs but still there might be some. Let's check 
this part carefully too.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [GLUTEN-4241][VL] Add plan node to convert Vanilla spark columnar format data to Velox columnar format data [incubator-gluten]

Reply via email to