Re: Collector stops working

Eran Kutner Thu, 27 Oct 2011 08:11:18 -0700

Just grepped a few days of logs and I don't see this error. It seems to be
correlated with higher load on the HDFS servers (like when map/reduce jobs
are running).
When it is happening the agents fail to connect to the collectors, but I
don't see any errors in the collectors logs. They just hang, while other
virtual collectors on the same server continue to work.


-eran



On Thu, Oct 27, 2011 at 06:39, Eric Sammer <esam...@cloudera.com> wrote:

> It's almost certainly the issue Mingjie mentioned. There's a race
> condition in the rolling that's plagued a few people. I'm heads down
> on NG but I think someone (probably Mingjie :)) was working on this.
>
>
>
> On Oct 26, 2011, at 1:59 PM, Mingjie Lai <mjla...@gmail.com> wrote:
>
> >
> > Quite some ppl mentioned on the list recently that the combination of
> RollSink + escapedCustomDfs causes issues. You may saw logs like these:
> >
> > 2011-10-17 17:30:07,190 [logicalNode collector0_log_dir-19] INFO
> com.cloudera.flume.core.connector.DirectDriver - Connector logicalNode
> collector0_log_dir-19 exited with error: Blocked append interrupted by
> rotation event
> > java.lang.InterruptedException: Blocked append interrupted by rotation
> event
> >        at
> com.cloudera.flume.handlers.rolling.RollSink.append(RollSink.java:209)
> >
> >
> > > 1500-2000 events per second
> >
> > It's not really a huge amount of data. Flume is expected to be able to
> handle it.
> >
> > Not sure anyone is looking at it. Sorry.
> >
> > Mingjie
> >
> > On 10/23/2011 09:07 AM, Eran Kutner wrote:
> >> Hi,
> >> I'm having a problem where flume collectors occasionally stop working
> >> under heavy load.
> >> I'm writing something like 1500-2000 events per second to my collectors,
> >> and occasionally they will just stop working. Nothing is written to the
> >> log the only indication that this is happening is that I see 0 messages
> >> being delivered when looking in the flume stats web page  and events
> >> start pilling up in the agents. Restarting the service solves the
> >> problem for a while (anything from a few minutes to a few days).
> >> An interesting thing to note is that this seems to be load related. It
> >> used to happen a lot more but then I split the collector into three
> >> virtual nodes and balanced the traffic on them and now it happens a lot
> >> less. Also, while one virtual collector stops working the others, on the
> >> same machine, continue to work fine.
> >>
> >> My collector configuration looks like this: collectorSource(54001) |
> >> collector(600000) {
> >> escapedFormatDfs("hdfs://hadoop1-m1:8020/raw-events/%Y-%m-%d/",
> >> "events-%{rolltag}-f01-c1.snappy", seqfile("SnappyCodec")) };
> >>
> >> I'm using 0.9.5 I've built a few weeks ago.
> >>
> >> Any ideas what can be causing it?
> >>
> >> -eran
> >>
>

Re: Collector stops working

Reply via email to