[ 
https://issues.apache.org/jira/browse/SPARK-6066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341015#comment-14341015
 ] 

Patrick Wendell commented on SPARK-6066:
----------------------------------------

Hey Marcelo,

I agree having a public library for reading these logs would be nice, 
especially if we decide to do fancier things with the way the logs are encoded 
in the future.

As it stands today though, it is nice that logs can be read by third party 
applications, even those that are not JVM based (Kay's app is actually written 
in Python), because we are using a very standard serialization format (JSON) 
and a very widely supported compression library (GZIP, IIRC?). I think we'll 
get benefit from this as long as we can support such widely used formats. This 
is especially important since Spark does not expose a library for importing 
these things today.

So making a minor change that allows these logs to be read by many third party 
systems, that seems worth it to me.



> Metadata in event log makes it very difficult for external libraries to parse 
> event log
> ---------------------------------------------------------------------------------------
>
>                 Key: SPARK-6066
>                 URL: https://issues.apache.org/jira/browse/SPARK-6066
>             Project: Spark
>          Issue Type: Bug
>    Affects Versions: 1.3.0
>            Reporter: Kay Ousterhout
>            Assignee: Andrew Or
>            Priority: Blocker
>
> The fix for SPARK-2261 added a line at the beginning of the event log that 
> encodes metadata.  This line makes it much more difficult to parse the event 
> logs from external libraries (like 
> https://github.com/kayousterhout/trace-analysis, which is used by folks at 
> Berkeley) because:
> (1) The metadata is not written as JSON, unlike the rest of the file
> (2) More annoyingly, if the file is compressed, the metadata is not 
> compressed.  This has a few side-effects: first, someone can't just use the 
> command line to uncompress the file and then look at the logs, because the 
> file is in this weird half-compressed format; and second, now external tools 
> that parse these logs also need to deal with this weird format.
> We should fix this before the 1.3 release, because otherwise we'll have to 
> add a bunch more backward-compatibility code to handle this weird format!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to