[ 
https://issues.apache.org/jira/browse/SPARK-3129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14102579#comment-14102579
 ] 

Hari Shreedharan commented on SPARK-3129:
-----------------------------------------

The way the driver "finds" the executors would be common for all the scheduling 
systems (it should really be independent of the scheduling/deployment). I agree 
about the auth part too. 

[~tdas] mentioned there is something similar already in standalone. I'd like to 
concentrate on YARN - if someone else is interested in Mesos please feel free 
to take it up!

I posted an initial patch for Client mode to simply keep the executors around 
(though it is not exposed via SparkSubmit which we can do once we can get the 
whole series of patches in). 

For YARN mode, does that mean the method calls have to be via reflection? I'd 
assume so. 

The reason I mentioned doing it via HDFS and then pinging the executors is to 
make it independent of YARN/Mesos/Standalone - we can just do it via 
StreamingContext and make it completely independent of the backend on which 
Spark is running (I am not even sure this should be a valid option for 
non-streaming cases, as it does not really add any value elsewhere).

> Prevent data loss in Spark Streaming
> ------------------------------------
>
>                 Key: SPARK-3129
>                 URL: https://issues.apache.org/jira/browse/SPARK-3129
>             Project: Spark
>          Issue Type: New Feature
>            Reporter: Hari Shreedharan
>            Assignee: Hari Shreedharan
>         Attachments: StreamingPreventDataLoss.pdf
>
>
> Spark Streaming can small amounts of data when the driver goes down - and the 
> sending system cannot re-send the data (or the data has already expired on 
> the sender side). The document attached has more details. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to