I've made sure all of our flume machines are on the same version, that doesn't seem to help.
Whenever I start a new node, it takes down the master. This happens every single time. I have to restart everything in a very specific order to make sure it starts working again. It's probably something to do with my use of the rpcSource, which causes issues in other areas too. -- Matthew Rathbone Foursquare | Software Engineer | Server Engineering Team matt...@foursquare.com (mailto:matt...@foursquare.com) | @rathboma (http://twitter.com/rathboma) | 4sq (http://foursquare.com/rathboma) On Sunday, August 28, 2011 at 9:36 AM, Bao Thai Ngo wrote: > Mike, > > I had the same problem with flume master. Try to remove flume and its init > script at Master machine, then re-install flume master again. Just remember > to save your configuration first. > > Good luck. > > ~Thai > > On Fri, Aug 26, 2011 at 10:55 PM, Mike <mikethe...@gmail.com > (mailto:mikethe...@gmail.com)> wrote: > > I'd also ensure that all nodes/masters/collectors/etc are using the > > precise same build of flume. > > > > On Fri, Aug 26, 2011 at 11:53 AM, Matthew Rathbone > > <matt...@foursquare.com (mailto:matt...@foursquare.com)> wrote: > > > Ah, I'm seeing this on single-master mode :-/. Anywhere else you think I > > > could look for useful debugging output? > > > -- > > > Matthew Rathbone > > > Foursquare | Software Engineer | Server Engineering Team > > > matt...@foursquare.com (mailto:matt...@foursquare.com) | @rathboma | 4sq > > > > > > On Friday, August 26, 2011 at 10:34 AM, Mike wrote: > > > > > > I did - but that was when we were testing multi-master mode, and since > > > it's not fully matured yet, I've gone back to a single master. > > > > > > On Fri, Aug 26, 2011 at 11:32 AM, Matthew Rathbone > > > <matt...@foursquare.com (mailto:matt...@foursquare.com)> wrote: > > > > > > You're right, there's another pid file there, that's crazy. > > > Have you experienced the unresponsiveness thing too? > > > -- > > > Matthew Rathbone > > > Foursquare | Software Engineer | Server Engineering Team > > > matt...@foursquare.com (mailto:matt...@foursquare.com) | @rathboma | 4sq > > > > > > On Friday, August 26, 2011 at 10:17 AM, Mike wrote: > > > > > > I recall a similar problem I had with this. > > > > > > It ended up being another pid-style file dropped somewhere else. > > > > > > /var/run/flume/flume-flume-master.pid > > > /tmp/flumemaster.pid > > > > > > See if those are still around once all the flume procs are dead. > > > > > > -M > > > > > > On Fri, Aug 26, 2011 at 11:03 AM, Matthew Rathbone > > > <matt...@foursquare.com (mailto:matt...@foursquare.com)> wrote: > > > > > > Hey all, > > > We're having totally unpredictable issues with the flume master > > > installation > > > lately, here's what happened to us last night / today: > > > YESTERDAY > > > Yesterday we added 8 new nodes to flume. They got set-up fine, and the > > > configs were registered. > > > a few hours later the master totally stops responding to anything > > > (web/shell/nodes), I don't find out until this morning. > > > TODAY > > > I try to stop it using the init script, that doesn't do anything, and it > > > continues to run, but be unresponsive > > > I kill -9 the flume processes, and remove the pid file, figuring I can > > > just > > > start it again > > > now the master won't start "master already running on > > > pid=<non-existent-pid>" > > > when I finally get it to start (changing the pid directory), it starts > > > being > > > unresponsive again > > > restart it, it does the same > > > stop all flume-nodes, restart it, looks good, start the flume nodes, it > > > goes > > > unresponsive again > > > restart it, and this time it works > > > > > > The only log above an INFO statement that I can see is this: > > > 2011-08-26 14:38:34,527 WARN com.cloudera.flume.agent.FlumeNode: Unable to > > > load output format plugin class - Class not found > > > but I don't think that's causing the issues. > > > > > > I do have a flume-node running on the same machine, could there be some > > > sort > > > of race condition happening? > > > Has anyone else seen behavior like this? > > > Any idea how to fix it? > > > Hoping someone can shed some light on this, I'm really not sure what's > > > going > > > on. > > > Thanks all > > > -- > > > Matthew Rathbone > > > Foursquare | Software Engineer | Server Engineering Team > > > matt...@foursquare.com (mailto:matt...@foursquare.com) | @rathboma | 4sq > > > > > > >