That's true, not sure what to do about that. Any ideas from anyone? Aaron
On Thu, Jul 17, 2014 at 9:42 AM, Chris Rohr <[email protected]> wrote: > Ok, thanks for the explanation. If we can get a count of how many > controllers and shards are expected, then the console at a minimum can give > a list of online nodes and alert/indicate if the counts don't match, but > wouldn't be able to tell you which ones were offline. > > Chris > > > On Thu, Jul 17, 2014 at 9:03 AM, Aaron McCurry <[email protected]> wrote: > > > Going forward Blur is going to have to support running natively on Yarn. > > The only way this will work is by the server processes binding to random > > ports. This will present the same problem with the console that Tim is > > encountering now, anytime a cluster is restarted the ports and therefore > > the register processes will change. To build on Tim's suggestion of the > > console maintaining a list of recent shard servers (as well as > controllers) > > we could also provide a count of the expected number of shard servers and > > maybe controllers as well. Once Blur is running in Yarn we will have an > > application master that contains that information. In the meantime we > > could come up with a configurable solution that could be accessed via > > thrift (or zookeeper). That way even if the console had never seen a > shard > > running on a server it would know that more shard servers are expected to > > be running. > > > > Thoughts? > > > > Aaron > > > > > > On Wed, Jul 16, 2014 at 3:28 PM, Tim Williams <[email protected]> > > wrote: > > > > > On Wed, Jul 16, 2014 at 2:09 PM, Chris Rohr <[email protected]> > > wrote: > > > > The console has a notion of online and offline to show a status to > the > > > > admins so they can be alerted if something goes offline and can take > an > > > > action. > > > > > > Typically that'd be a role of nagios or somesuch - I wonder if you > > > could maintain, internally, a list of 'recent' shard servers yourself > > > to provide some clue that there might be a problem? In other words > > > you keep a cache of all the one's that you've seen with a TTL and let > > > them fall out after some period of time? > > > > > > --tim > > > > > > > > > > On Wed, Jul 16, 2014 at 12:52 PM, Tim Williams <[email protected] > > > > > wrote: > > > > > > > >> On Wed, Jul 16, 2014 at 12:31 PM, Chris Rohr <[email protected]> > > > wrote: > > > >> > Would this be from the Thrift calls only? (i.e. not the > > > >> > ZookeeperClusterStatus object?) The console uses the > > > >> > ZookeeperClusterStatus object to get online/offline shards and > > > >> controllers > > > >> > from ZK. > > > >> > > > >> No, it'd be removed everywhere (thrift and zk path). It'd basically > > > >> get rid of the notion of 'offline' shards - your usage is > essentially > > > >> the same as the TopCommand I described. The trouble is that in a > world > > > >> of random ports a lot of bookkeeping overhead would be necessary to > > > >> reliably maintain the notion of 'offline' or 'registered vs online' > > > >> shards. As I understand it, the need for them was back when the > > > >> layout manager relied on that knowledge but the default layout > manager > > > >> is more dynamic now. Do you just display them or is there another > > > >> need in the console for them? > > > >> > > > >> Thanks, > > > >> --tim > > > >> > > > > > >
