[
https://issues.apache.org/jira/browse/SPARK-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Josh Rosen resolved SPARK-2496.
-------------------------------
Resolution: Incomplete
Resolving as "Incomplete"; if we still want to do this then we should wait
until we have a specific concrete use-case / list of things that need to be
changed.
> Compression streams should write its codec info to the stream
> -------------------------------------------------------------
>
> Key: SPARK-2496
> URL: https://issues.apache.org/jira/browse/SPARK-2496
> Project: Spark
> Issue Type: Improvement
> Components: Shuffle, Spark Core
> Reporter: Reynold Xin
> Priority: Critical
>
> Spark sometime store compressed data outside of Spark (e.g. event logs,
> blocks in tachyon), and those data are read back directly using the codec
> configured by the user. When the codec differs between runs, Spark wouldn't
> be able to read the codec back.
> I'm not sure what the best strategy here is yet. If we write the codec
> identifier for all streams, then we will be writing a lot of identifiers for
> shuffle blocks. One possibility is to only write it for blocks that will be
> shared across different Spark instances (i.e. managed outside of Spark),
> which includes tachyon blocks and event log blocks.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]