That sounds like great follow on work (clients register ephemerally so the master can tell clients to disconnect, etc.), but I think just having a client that can get a better read on the state of the system is a phenomenal starting point.
On Tue, Jan 26, 2016 at 11:52 AM Keith Turner <[email protected]> wrote: > On Mon, Jan 25, 2016 at 10:59 AM, John Vines <[email protected]> wrote: > > > Of course, it's when I hit send that I realize that we could mitigate by > > making the client aware of the master state, and if the system is shut > down > > > > Thats a good idea. Should consider the use case when someone wants to shut > Accumulo down and bring it back up immediately. We could allow an admin to > decide what they want clients to do when they shutdown Accumulo (clients > die, wait, anything else?). This could be accomplished with supplemental > information in ZK or other goal states. > > > > (which was the case for that ticket), then it can fail quickly with a > > descriptive message. > > > > On Mon, Jan 25, 2016 at 10:58 AM John Vines <[email protected]> wrote: > > > > > While we want to be fault tolerant, there's a point where we want to > > > eventually fail. I know we have a couple never ending retry loops that > > need > > > to be addressed (https://issues.apache.org/jira/browse/ACCUMULO-1268), > > > but I'm unsure if queries suffer from this problem. > > > > > > Unfortunately, fault tolerance is a bit at odds with instant > notification > > > of system issues, since some of the fault tolerance is temporally > > oriented. > > > And that ticket lacks context of it never failing out vs. failing out > > > eventually (but too long for the user) > > > > > > > > > On Sun, Jan 24, 2016 at 7:46 PM Christopher <[email protected]> > wrote: > > > > > >> I saw this bug report: > > >> https://bugzilla.redhat.com/show_bug.cgi?id=1300987 > > >> > > >> As far as I can tell, they are reporting normal, expected, and desired > > >> behavior of Accumulo as a bug. But, is there something we can do > > upstream > > >> to enable fast failures in the case of Accumulo not running to support > > >> their use case? > > >> > > >> Personally, I don't see how we can reliably detect within the client > > that > > >> the cluster is down or up, vs. a normal temporary server > > outage/migration, > > >> since there is there is no single point of authority for Accumulo to > > >> determine its overall operating status if ZooKeeper is running and no > > >> other > > >> servers are. Am I wrong? > > >> > > > > > >
