[ 
https://issues.apache.org/jira/browse/SPARK-6066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14340904#comment-14340904
 ] 

Marcelo Vanzin commented on SPARK-6066:
---------------------------------------

I think the correct option if third parties want to read these files is to add 
a library that allows them to do so. For example, you don't try to read a 
Snappy file with FileInputStream directly, you use Snappy's library for that, 
since it handles the header that contains information about how to uncompress 
the data and the actual decompression.

That is similar to this case. There's a header with the metadata about the file 
contents, just like before there were a bunch of files in a directory that 
served as that metadata. The only sub-optimal part is that we don't have a 
public library to read it.

The extension can help a little bit; it would still require the user to 
understand the extension-to-codec mapping, and then the contents of the header 
and what they mean.

> Metadata in event log makes it very difficult for external libraries to parse 
> event log
> ---------------------------------------------------------------------------------------
>
>                 Key: SPARK-6066
>                 URL: https://issues.apache.org/jira/browse/SPARK-6066
>             Project: Spark
>          Issue Type: Bug
>    Affects Versions: 1.3.0
>            Reporter: Kay Ousterhout
>            Assignee: Andrew Or
>            Priority: Blocker
>
> The fix for SPARK-2261 added a line at the beginning of the event log that 
> encodes metadata.  This line makes it much more difficult to parse the event 
> logs from external libraries (like 
> https://github.com/kayousterhout/trace-analysis, which is used by folks at 
> Berkeley) because:
> (1) The metadata is not written as JSON, unlike the rest of the file
> (2) More annoyingly, if the file is compressed, the metadata is not 
> compressed.  This has a few side-effects: first, someone can't just use the 
> command line to uncompress the file and then look at the logs, because the 
> file is in this weird half-compressed format; and second, now external tools 
> that parse these logs also need to deal with this weird format.
> We should fix this before the 1.3 release, because otherwise we'll have to 
> add a bunch more backward-compatibility code to handle this weird format!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to