[ 
https://issues.apache.org/jira/browse/SPARK-6270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15053512#comment-15053512
 ] 

Steve Loughran commented on SPARK-6270:
---------------------------------------

replay time itself is going to be steep, which, given that summary metadata 
doesn't really need, is a large amount of wasted IO, coded CPU and Json deser. 
Now, if someone were to have a protobuf or avro event format, you'd get really 
good compression in exchange for the suffering of developers.

What could boost startup is what comes in the yarn timeline integration: 
extraction of summary data (times, finished flag) without having to do the 
replay. A summary file alongside the main one would work there, perhaps with 
the file length of real log listed in the summary so as to prove that the 
summary is in sync with the saved log. (mismatch == fallback to replay, save 
the summary for next time).

There's one more thing to consider with those standalone logs —if the 
destination is an object store, should the flush/commit logic be different? 
You'd want to make sure that an s3a dest had multipart upload enabled, then 
have a partial upload trigger on a flush-class event, rather than wait until 
the end of the run. Today you don't get those guarantees and hence run the risk 
that a failed app could lose the history

> Standalone Master hangs when streaming job completes and event logging is 
> enabled
> ---------------------------------------------------------------------------------
>
>                 Key: SPARK-6270
>                 URL: https://issues.apache.org/jira/browse/SPARK-6270
>             Project: Spark
>          Issue Type: Bug
>          Components: Deploy, Streaming
>    Affects Versions: 1.2.0, 1.2.1, 1.3.0, 1.5.1
>            Reporter: Tathagata Das
>            Priority: Critical
>
> If the event logging is enabled, the Spark Standalone Master tries to 
> recreate the web UI of a completed Spark application from its event logs. 
> However if this event log is huge (e.g. for a Spark Streaming application), 
> then the master hangs in its attempt to read and recreate the web ui. This 
> hang causes the whole standalone cluster to be unusable. 
> Workaround is to disable the event logging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to