Do we have any high availability support in Spark driver level? For example, if we want spark drive can move to another node continue execution when failure happen. I can see the RDD checkpoint can help to serialization the status of RDD. I can image to load the check point from another node when error happen, but seems like will lost track all tasks status or even executor information that maintain in spark context. I am not sure if there is any existing stuff I can leverage to do that. thanks for any suggests Best Regards Jun Feng Liu IBM China Systems & Technology Laboratory in Beijing
Phone: 86-10-82452683 E-mail: [email protected] BLD 28,ZGC Software Park No.8 Rd.Dong Bei Wang West, Dist.Haidian Beijing 100193 China
