kevinwilfong opened a new pull request, #10662:
URL: https://github.com/apache/incubator-gluten/pull/10662

   ## What changes are proposed in this pull request?
   
   Today the GlutenPartition objects contain an array of byte arrays which are 
the Protobuf serialized ReadRel.read_type objects from the SplitInfos. The 
GlutenPartitions are Java serialized and sent to the Executors responsible for 
their respective Tasks. Looking through the code it appears we Protobuf 
serialize the SplitInfos so we can easily pass them across the JNI boundary.
   
   We see the serialized SplitInfos can consume a significant amount of memory 
in the Driver, this is because as SplitInfo objects their state can share 
references tot he same objects, but once serialized they share nothing, which 
explodes their size in memory.
   
   If we Java serialize the SplitInfo objects like the rest of the 
GlutenPartition state, and Protobuf serialize them as part of the Task, this 
can significantly save driver memory. The cost is a little additional memory in 
the Task, the size of the SplitInfo objects for a single GlutenPartition which 
should be trivial, and a little additional CPU instead of Protobuf serializing 
in the Driver and Java serializing the array of byte arrays, we Java serialize 
the array of SplitInfos, and on the Task we pay the additional cost of Java 
deserializing an array of SplitInfos and Protobuf serializing them, overall the 
difference is just the additional cost of Java serializing the SplitInfos 
instead of byte arrays.
   
   ## How was this patch tested?
   
   Ran the existing unit tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to