[ 
https://issues.apache.org/jira/browse/SPARK-40912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Emil Ejbyfeldt updated SPARK-40912:
-----------------------------------
    Description: The interface of DeserializationStream forces implementation 
to raise EOFException to indicate that there is no more data. And for the 
KryoDeserializtionStream it even worse since the kryo library does not raise 
EOFException we pay for the price of two exceptions for each stream. For large 
shuffles with lots of small stream this is quite a bit large overhead (seen 
couple % of cpu time). It also less safe to depend exceptions as it might me 
raised for different reasons like corrupt data and that currently cause data 
loss.  (was: The interface of DeserializationStream forces implementation to 
raise EOFException to indicate that there is no more data. And for the 
KryoDeserializtionStream it even worse since the kryo library does not raise 
EOFException we pay for the price of two exceptions for each stream. For large 
shuffles with lots of small stream this is quite a bit large overhead (seen 
couple % of cpu time).)

> Overhead of Exceptions in DeserializationStream 
> ------------------------------------------------
>
>                 Key: SPARK-40912
>                 URL: https://issues.apache.org/jira/browse/SPARK-40912
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 3.3.0
>            Reporter: Emil Ejbyfeldt
>            Priority: Minor
>
> The interface of DeserializationStream forces implementation to raise 
> EOFException to indicate that there is no more data. And for the 
> KryoDeserializtionStream it even worse since the kryo library does not raise 
> EOFException we pay for the price of two exceptions for each stream. For 
> large shuffles with lots of small stream this is quite a bit large overhead 
> (seen couple % of cpu time). It also less safe to depend exceptions as it 
> might me raised for different reasons like corrupt data and that currently 
> cause data loss.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to