Just grepped a few days of logs and I don't see this error. It seems to be correlated with higher load on the HDFS servers (like when map/reduce jobs are running). When it is happening the agents fail to connect to the collectors, but I don't see any errors in the collectors logs. They just hang, while other virtual collectors on the same server continue to work.
-eran On Thu, Oct 27, 2011 at 06:39, Eric Sammer <esam...@cloudera.com> wrote: > It's almost certainly the issue Mingjie mentioned. There's a race > condition in the rolling that's plagued a few people. I'm heads down > on NG but I think someone (probably Mingjie :)) was working on this. > > > > On Oct 26, 2011, at 1:59 PM, Mingjie Lai <mjla...@gmail.com> wrote: > > > > > Quite some ppl mentioned on the list recently that the combination of > RollSink + escapedCustomDfs causes issues. You may saw logs like these: > > > > 2011-10-17 17:30:07,190 [logicalNode collector0_log_dir-19] INFO > com.cloudera.flume.core.connector.DirectDriver - Connector logicalNode > collector0_log_dir-19 exited with error: Blocked append interrupted by > rotation event > > java.lang.InterruptedException: Blocked append interrupted by rotation > event > > at > com.cloudera.flume.handlers.rolling.RollSink.append(RollSink.java:209) > > > > > > > 1500-2000 events per second > > > > It's not really a huge amount of data. Flume is expected to be able to > handle it. > > > > Not sure anyone is looking at it. Sorry. > > > > Mingjie > > > > On 10/23/2011 09:07 AM, Eran Kutner wrote: > >> Hi, > >> I'm having a problem where flume collectors occasionally stop working > >> under heavy load. > >> I'm writing something like 1500-2000 events per second to my collectors, > >> and occasionally they will just stop working. Nothing is written to the > >> log the only indication that this is happening is that I see 0 messages > >> being delivered when looking in the flume stats web page and events > >> start pilling up in the agents. Restarting the service solves the > >> problem for a while (anything from a few minutes to a few days). > >> An interesting thing to note is that this seems to be load related. It > >> used to happen a lot more but then I split the collector into three > >> virtual nodes and balanced the traffic on them and now it happens a lot > >> less. Also, while one virtual collector stops working the others, on the > >> same machine, continue to work fine. > >> > >> My collector configuration looks like this: collectorSource(54001) | > >> collector(600000) { > >> escapedFormatDfs("hdfs://hadoop1-m1:8020/raw-events/%Y-%m-%d/", > >> "events-%{rolltag}-f01-c1.snappy", seqfile("SnappyCodec")) }; > >> > >> I'm using 0.9.5 I've built a few weeks ago. > >> > >> Any ideas what can be causing it? > >> > >> -eran > >> >