[ https://issues.apache.org/jira/browse/SPARK-6066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14340882#comment-14340882 ]
Patrick Wendell commented on SPARK-6066: ---------------------------------------- What if as a simple fix we do these things: 1. Put the compression information in the filename. For instance, can we just use an extension indicating the compression format? That's a very common convention. 2. Have the meta-data header use the same compression options as the rest of the file. 3. Have the meta-data header use a single-line JSON dictionary instead of having a second type of format that is not JSON. I think this would solve the main inter-op problem with other people trying to read these logs. I do think we should try, within reason, to make these easy for third parties to process without a lot of extra effort. It's a big win because it allows researchers, etc to run fairly fine grained analysis of Spark's behavior. [~vanzin] any thoughts here? > Metadata in event log makes it very difficult for external libraries to parse > event log > --------------------------------------------------------------------------------------- > > Key: SPARK-6066 > URL: https://issues.apache.org/jira/browse/SPARK-6066 > Project: Spark > Issue Type: Bug > Affects Versions: 1.3.0 > Reporter: Kay Ousterhout > Assignee: Andrew Or > Priority: Blocker > > The fix for SPARK-2261 added a line at the beginning of the event log that > encodes metadata. This line makes it much more difficult to parse the event > logs from external libraries (like > https://github.com/kayousterhout/trace-analysis, which is used by folks at > Berkeley) because: > (1) The metadata is not written as JSON, unlike the rest of the file > (2) More annoyingly, if the file is compressed, the metadata is not > compressed. This has a few side-effects: first, someone can't just use the > command line to uncompress the file and then look at the logs, because the > file is in this weird half-compressed format; and second, now external tools > that parse these logs also need to deal with this weird format. > We should fix this before the 1.3 release, because otherwise we'll have to > add a bunch more backward-compatibility code to handle this weird format! -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org