[
https://issues.apache.org/jira/browse/SPARK-13914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15196010#comment-15196010
]
Parag Chaudhari commented on SPARK-13914:
-----------------------------------------
The only available alternative which I know is manually copy files to desired
back up location. Manual copy of files doesn't provide real-time events,
especially in case of spark streaming jobs which can run for a long period of
time. Copying files after job completion does not make sense if someone wants
real time backup as the events happen.
In cloud environment, this patch provides common way for all users to configure
backup of event logs using existing cloud back up agents by configuring the
back up directory to local disks(from where these back up agents would pick up
these files). At the same time, it also allows users to configure main event
log directory of their own choice. What do you think?
> Add functionality to back up spark event logs
> ---------------------------------------------
>
> Key: SPARK-13914
> URL: https://issues.apache.org/jira/browse/SPARK-13914
> Project: Spark
> Issue Type: Improvement
> Components: Scheduler
> Affects Versions: 1.6.0, 1.6.2, 2.0.0
> Reporter: Parag Chaudhari
>
> Spark event logs are usually stored in HDFS when running Spark on YARN. In a
> cloud environment, these HDFS files are often stored on the disks of
> ephemeral instances that could go away once the instances are terminated.
> Users may want to persist the event logs as the event happens for issue
> investigation and performance analysis before and after the cluster is
> terminated. The backup path can be managed by the spark users based on their
> needs. For example, some users may copy the event logs to a cloud storage
> service directly and keep them there forever. While some other users may want
> to store the event logs on local disks and back them up to a cloud storage
> service from time to time. Other users will not want to use the feature, so
> this feature should be off by default; users enable the feature when and only
> when they need it.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]