wForget opened a new pull request, #10733:
URL: https://github.com/apache/incubator-gluten/pull/10733

   Backport #10541 to branch-1.5
   
   ## What changes are proposed in this pull request?
   
   This pull request introduces a safer and more robust approach for handling 
Spark's BroadcastMode during serialization. The main improvement is the 
introduction of a new SafeBroadcastMode abstraction and related utilities, 
which help avoid serialization issues that caused a Stackoverflow exception 
during broadcast exchanges. BroadcastMode was introduced in this PR that caused 
the issue we observed. HashedRelationBroadcastMode embeds Catalyst expression 
trees, which are not safe to Kryo-serialize when running with 
spark.kryo.referenceTracking=false (default internally).
   
   With this change, the broadcast payload now contains only primitives and 
byte arrays (no Catalyst trees). For bound keys, we serialize just column 
ordinals (+ null-aware flag) and for computed keys (e.g., upper(col)), we 
serialize the key expressions once as Java bytes and deserialize only where 
needed to build projections.
   
   (cherry picked from commit 91c52e15f16593747e918145258ebe1408cb8ea2)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to