[jira] [Resolved] (SPARK-2496) Compression streams should write its codec info to the stream

Josh Rosen (JIRA) Wed, 16 Sep 2015 11:39:09 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Josh Rosen resolved SPARK-2496.
-------------------------------
    Resolution: Incomplete

Resolving as "Incomplete"; if we still want to do this then we should wait 
until we have a specific concrete use-case / list of things that need to be 
changed.

> Compression streams should write its codec info to the stream
> -------------------------------------------------------------
>
>                 Key: SPARK-2496
>                 URL: https://issues.apache.org/jira/browse/SPARK-2496
>             Project: Spark
>          Issue Type: Improvement
>          Components: Shuffle, Spark Core
>            Reporter: Reynold Xin
>            Priority: Critical
>
> Spark sometime store compressed data outside of Spark (e.g. event logs, 
> blocks in tachyon), and those data are read back directly using the codec 
> configured by the user. When the codec differs between runs, Spark wouldn't 
> be able to read the codec back. 
> I'm not sure what the best strategy here is yet. If we write the codec 
> identifier for all streams, then we will be writing a lot of identifiers for 
> shuffle blocks. One possibility is to only write it for blocks that will be 
> shared across different Spark instances (i.e. managed outside of Spark), 
> which includes tachyon blocks and event log blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Resolved] (SPARK-2496) Compression streams should write its codec info to the stream

Reply via email to