Josh Rosen commented on SPARK-3132:

I don't think that this is being actively worked on. I remember doing a POC 
prototype of using a custom {{Serializer}} for byte arrays and found that doing 
that by itself didn't seem to result in huge performance gains, but if we can 
manage to skip JVM-side compression of already-compressed Python arrays then I 
could see that being a reasonable small win.

> Avoid serialization for Array[Byte] in TorrentBroadcast
> -------------------------------------------------------
>                 Key: SPARK-3132
>                 URL: https://issues.apache.org/jira/browse/SPARK-3132
>             Project: Spark
>          Issue Type: Sub-task
>          Components: Spark Core
>            Reporter: Reynold Xin
> If the input data is a byte array, we should allow TorrentBroadcast to skip 
> serializing and compressing the input.
> To do this, we should add a new parameter (shortCircuitByteArray) to 
> TorrentBroadcast, and then avoid serialization in if the input is byte array 
> and shortCircuitByteArray is true.
> We should then also do compression in task serialization itself instead of 
> doing it in TorrentBroadcast.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to