[ https://issues.apache.org/jira/browse/SPARK-6066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341881#comment-14341881 ]
Patrick Wendell commented on SPARK-6066: ---------------------------------------- [~vanzin] - yes you are right (an early scratch version of the feature used a Gzip stream, I think). There are python bindings for all three of those compression codecs. To be fair, I'm not 100% sure the codecs are standardized enough to be compatible across different implementations. Gzip is pretty good in this regard, but not sure about those other three. > Metadata in event log makes it very difficult for external libraries to parse > event log > --------------------------------------------------------------------------------------- > > Key: SPARK-6066 > URL: https://issues.apache.org/jira/browse/SPARK-6066 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 1.3.0 > Reporter: Kay Ousterhout > Assignee: Andrew Or > Priority: Blocker > > The fix for SPARK-2261 added a line at the beginning of the event log that > encodes metadata. This line makes it much more difficult to parse the event > logs from external libraries (like > https://github.com/kayousterhout/trace-analysis, which is used by folks at > Berkeley) because: > (1) The metadata is not written as JSON, unlike the rest of the file > (2) More annoyingly, if the file is compressed, the metadata is not > compressed. This has a few side-effects: first, someone can't just use the > command line to uncompress the file and then look at the logs, because the > file is in this weird half-compressed format; and second, now external tools > that parse these logs also need to deal with this weird format. > We should fix this before the 1.3 release, because otherwise we'll have to > add a bunch more backward-compatibility code to handle this weird format! -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org