[ https://issues.apache.org/jira/browse/SPARK-6066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341016#comment-14341016 ]
Marcelo Vanzin commented on SPARK-6066: --------------------------------------- We only use codecs supported by Spark: snappy, lzo, lzf. Can raw python read those? Changing the header to be a single-line JSON thing is probably a very small change. > Metadata in event log makes it very difficult for external libraries to parse > event log > --------------------------------------------------------------------------------------- > > Key: SPARK-6066 > URL: https://issues.apache.org/jira/browse/SPARK-6066 > Project: Spark > Issue Type: Bug > Affects Versions: 1.3.0 > Reporter: Kay Ousterhout > Assignee: Andrew Or > Priority: Blocker > > The fix for SPARK-2261 added a line at the beginning of the event log that > encodes metadata. This line makes it much more difficult to parse the event > logs from external libraries (like > https://github.com/kayousterhout/trace-analysis, which is used by folks at > Berkeley) because: > (1) The metadata is not written as JSON, unlike the rest of the file > (2) More annoyingly, if the file is compressed, the metadata is not > compressed. This has a few side-effects: first, someone can't just use the > command line to uncompress the file and then look at the logs, because the > file is in this weird half-compressed format; and second, now external tools > that parse these logs also need to deal with this weird format. > We should fix this before the 1.3 release, because otherwise we'll have to > add a bunch more backward-compatibility code to handle this weird format! -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org