[ https://issues.apache.org/jira/browse/SPARK-3129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14102579#comment-14102579 ]
Hari Shreedharan commented on SPARK-3129: ----------------------------------------- The way the driver "finds" the executors would be common for all the scheduling systems (it should really be independent of the scheduling/deployment). I agree about the auth part too. [~tdas] mentioned there is something similar already in standalone. I'd like to concentrate on YARN - if someone else is interested in Mesos please feel free to take it up! I posted an initial patch for Client mode to simply keep the executors around (though it is not exposed via SparkSubmit which we can do once we can get the whole series of patches in). For YARN mode, does that mean the method calls have to be via reflection? I'd assume so. The reason I mentioned doing it via HDFS and then pinging the executors is to make it independent of YARN/Mesos/Standalone - we can just do it via StreamingContext and make it completely independent of the backend on which Spark is running (I am not even sure this should be a valid option for non-streaming cases, as it does not really add any value elsewhere). > Prevent data loss in Spark Streaming > ------------------------------------ > > Key: SPARK-3129 > URL: https://issues.apache.org/jira/browse/SPARK-3129 > Project: Spark > Issue Type: New Feature > Reporter: Hari Shreedharan > Assignee: Hari Shreedharan > Attachments: StreamingPreventDataLoss.pdf > > > Spark Streaming can small amounts of data when the driver goes down - and the > sending system cannot re-send the data (or the data has already expired on > the sender side). The document attached has more details. -- This message was sent by Atlassian JIRA (v6.2#6252) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org