I wonder if we can add error handling policies to Curator. Currently, the 
policy of all recipes is hard-coded to treat SUSPENDED as a type of lost 
session. We could change this to be injected like the retry policy. To solve 
this particular issue we’d also need to introduce a SESSION_LOST state of some 
type. This is complicated as Curator re-creates connections internally. 

Thoughts?

-Jordan



On August 20, 2015 at 2:10:52 AM, Dong Lei ([email protected]) wrote:

Hi curator-devs:  

We use Spark in standalone mode in which Spark leverage curator to manage ZK 
connections and elect leader. Our Zookeeper may be not very stable and we get 
"session suspended and reconnected" sometimes. The problem is that this kind of 
disassociated and reconnected triggers leader election quite often. And Spark's 
reaction to leadership switching can be very costly.  

So I'm thinking about whether it's possible to tolerate such failure cases if 
we can reconnect soon and the session is actually kept after the reconnection?  
Or does such a requirement makes sense to you?  

Any advice will be appreciated.  


Thanks  
Dong Lei  

Reply via email to