Hi curator-devs: We use Spark in standalone mode in which Spark leverage curator to manage ZK connections and elect leader. Our Zookeeper may be not very stable and we get "session suspended and reconnected" sometimes. The problem is that this kind of disassociated and reconnected triggers leader election quite often. And Spark's reaction to leadership switching can be very costly.
So I'm thinking about whether it's possible to tolerate such failure cases if we can reconnect soon and the session is actually kept after the reconnection? Or does such a requirement makes sense to you? Any advice will be appreciated. Thanks Dong Lei
