Ok, thanks for the explanation. If we can get a count of how many controllers and shards are expected, then the console at a minimum can give a list of online nodes and alert/indicate if the counts don't match, but wouldn't be able to tell you which ones were offline.
Chris On Thu, Jul 17, 2014 at 9:03 AM, Aaron McCurry <[email protected]> wrote: > Going forward Blur is going to have to support running natively on Yarn. > The only way this will work is by the server processes binding to random > ports. This will present the same problem with the console that Tim is > encountering now, anytime a cluster is restarted the ports and therefore > the register processes will change. To build on Tim's suggestion of the > console maintaining a list of recent shard servers (as well as controllers) > we could also provide a count of the expected number of shard servers and > maybe controllers as well. Once Blur is running in Yarn we will have an > application master that contains that information. In the meantime we > could come up with a configurable solution that could be accessed via > thrift (or zookeeper). That way even if the console had never seen a shard > running on a server it would know that more shard servers are expected to > be running. > > Thoughts? > > Aaron > > > On Wed, Jul 16, 2014 at 3:28 PM, Tim Williams <[email protected]> > wrote: > > > On Wed, Jul 16, 2014 at 2:09 PM, Chris Rohr <[email protected]> > wrote: > > > The console has a notion of online and offline to show a status to the > > > admins so they can be alerted if something goes offline and can take an > > > action. > > > > Typically that'd be a role of nagios or somesuch - I wonder if you > > could maintain, internally, a list of 'recent' shard servers yourself > > to provide some clue that there might be a problem? In other words > > you keep a cache of all the one's that you've seen with a TTL and let > > them fall out after some period of time? > > > > --tim > > > > > > > On Wed, Jul 16, 2014 at 12:52 PM, Tim Williams <[email protected]> > > wrote: > > > > > >> On Wed, Jul 16, 2014 at 12:31 PM, Chris Rohr <[email protected]> > > wrote: > > >> > Would this be from the Thrift calls only? (i.e. not the > > >> > ZookeeperClusterStatus object?) The console uses the > > >> > ZookeeperClusterStatus object to get online/offline shards and > > >> controllers > > >> > from ZK. > > >> > > >> No, it'd be removed everywhere (thrift and zk path). It'd basically > > >> get rid of the notion of 'offline' shards - your usage is essentially > > >> the same as the TopCommand I described. The trouble is that in a world > > >> of random ports a lot of bookkeeping overhead would be necessary to > > >> reliably maintain the notion of 'offline' or 'registered vs online' > > >> shards. As I understand it, the need for them was back when the > > >> layout manager relied on that knowledge but the default layout manager > > >> is more dynamic now. Do you just display them or is there another > > >> need in the console for them? > > >> > > >> Thanks, > > >> --tim > > >> > > >
