[
https://issues.apache.org/jira/browse/SPARK-6066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14340882#comment-14340882
]
Patrick Wendell commented on SPARK-6066:
----------------------------------------
What if as a simple fix we do these things:
1. Put the compression information in the filename. For instance, can we just
use an extension indicating the compression format? That's a very common
convention.
2. Have the meta-data header use the same compression options as the rest of
the file.
3. Have the meta-data header use a single-line JSON dictionary instead of
having a second type of format that is not JSON.
I think this would solve the main inter-op problem with other people trying to
read these logs. I do think we should try, within reason, to make these easy
for third parties to process without a lot of extra effort. It's a big win
because it allows researchers, etc to run fairly fine grained analysis of
Spark's behavior.
[~vanzin] any thoughts here?
> Metadata in event log makes it very difficult for external libraries to parse
> event log
> ---------------------------------------------------------------------------------------
>
> Key: SPARK-6066
> URL: https://issues.apache.org/jira/browse/SPARK-6066
> Project: Spark
> Issue Type: Bug
> Affects Versions: 1.3.0
> Reporter: Kay Ousterhout
> Assignee: Andrew Or
> Priority: Blocker
>
> The fix for SPARK-2261 added a line at the beginning of the event log that
> encodes metadata. This line makes it much more difficult to parse the event
> logs from external libraries (like
> https://github.com/kayousterhout/trace-analysis, which is used by folks at
> Berkeley) because:
> (1) The metadata is not written as JSON, unlike the rest of the file
> (2) More annoyingly, if the file is compressed, the metadata is not
> compressed. This has a few side-effects: first, someone can't just use the
> command line to uncompress the file and then look at the logs, because the
> file is in this weird half-compressed format; and second, now external tools
> that parse these logs also need to deal with this weird format.
> We should fix this before the 1.3 release, because otherwise we'll have to
> add a bunch more backward-compatibility code to handle this weird format!
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]