Bump. Anyone know how to get diskFailover() to work with a collector dfs sink? Am I just going about this the wrong way? Thanks,
-Steve On Thu, Oct 20, 2011 at 3:19 AM, Stephen Layland <stephen.layl...@gmail.com>wrote: > Hi, > > I'm trying to set up a simple disk failover scenario but we're currently > not using an agent tier. I did something like: > > exec config machine 'syslogTcp(5140)' 'collector() { diskFailover() > escapedCustomDfs("hdfs://...",..) }' > > but when writing syslog packets to the flume node the thread dies with an > IllegalArgumentException: > > java.lang.IllegalArgumentException: Event already had an event with > attribute rolltag > at com.cloudera.flume.core.EventBaseImpl.set(EventBaseImpl.java:62) > at > com.cloudera.flume.handlers.rolling.RollSink.synchronousAppend(RollSink.java:231) > at > com.cloudera.flume.handlers.rolling.RollSink$1.call(RollSink.java:183) > at > com.cloudera.flume.handlers.rolling.RollSink$1.call(RollSink.java:181) > at > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > > The goog pointed me to several useful threads, including this jira: > > https://issues.apache.org/jira/browse/FLUME-735 > > I tried the mask("rolltag") workaround mentioned in the Jira and changed my > config to: > > exec config machine 'syslogTcp(5140)' 'collector() { mask("rolltag") > diskFailover() escapedCustomDfs("hdfs://...",..) }' > > This change seemed to take care of the above error (though I'm always > writing to the same HDFS file now without a rolltag) but I still can't seem > to get the disk failover to work as I'd expect. > > I'm following the same procedure that a previous poster mentioned by doing > a hadoop fs -chown -R non-flume /path/to/target to simulate an HDFS failure. > I'd expect the sequence files in /tmp/flume-flume/agent/dfo* to actually > get filled up with data, but they're all empty. In addition, the flume node > starts a death spiral after it can't write to HDFS. Instead of the > DiskFilaoverDecorator taking over, it instead just continually loops and > spits the following into the log file until I kill the node process: > > 2011-10-20 02:58:26,434 INFO > com.cloudera.flume.agent.diskfailover.DiskFailoverDeco: Attempt 2 with no > progress being made on disk failover subsink > > 2011-10-20 02:58:26,935 ERROR > com.cloudera.flume.core.connector.DirectDriver: Expected IDLE but timed out > in state CLOSING > : > : > 2011-10-20 03:02:09,642 ERROR > com.cloudera.flume.core.connector.DirectDriver: Expected IDLE but timed out > in state CLOSING > 2011-10-20 03:02:09,642 INFO > com.cloudera.flume.agent.diskfailover.DiskFailoverDeco: Attempt 448 with no > progress being made on disk failover subsink > 2011-10-20 03:02:09,642 WARN > com.cloudera.flume.agent.diskfailover.DiskFailoverDeco: WAL drain thread was > not making progress, forcing close > > A more complete log excerpt is below[1]. Does anyone know how I can get > diskFailover() to work with this dfs sink? > > Many thanks, > > -Steve > > [1] > 2011-10-20 02:58:25,242 INFO > com.cloudera.flume.agent.diskfailover.NaiveFileFailoverManager: opening new > file for 20111020-025815237-0700.2276946395587008.00000142 > > 2011-10-20 02:58:25,242 INFO > com.cloudera.flume.handlers.hdfs.SeqfileEventSink: constructed new seqfile > event sink: file=/var/tmp/flume-flume/agent/ > pdp20.lindenlab.com/dfo_writing/20111020-025825242-0700.2276956400117096.00000142 > > > 2011-10-20 02:58:25,243 INFO > com.cloudera.flume.agent.diskfailover.DiskFailoverSource: end of file > com.cloudera.flume.agent.diskfailover.NaiveFileFailoverManager$StateChangeDeco@2af4fb44 > 2011-10-20 02:58:25,434 INFO com.cloudera.flume.handlers.rolling.RollSink: > closing RollSink 'NaiveFileFailover' > > 2011-10-20 02:58:25,434 INFO > com.cloudera.flume.handlers.hdfs.SeqfileEventSink: closed > /var/tmp/flume-flume/agent/ > pdp20.lindenlab.com/dfo_writing/20111020-025825242-0700.2276956400117096.00000142 > 2011-10-20 02:58:25,434 INFO > com.cloudera.flume.agent.diskfailover.NaiveFileFailoverManager: File lives > in /var/tmp/flume-flume/agent/ > pdp20.lindenlab.com/dfo_writing/20111020-025825242-0700.2276956400117096.00000142 > > > 2011-10-20 02:58:25,434 INFO > com.cloudera.flume.agent.diskfailover.NaiveFileFailoverManager: opening new > file for 20111020-025825242-0700.2276956400117096.00000142 > > 2011-10-20 02:58:25,435 INFO > com.cloudera.flume.agent.diskfailover.DiskFailoverSource: end of file > com.cloudera.flume.agent.diskfailover.NaiveFileFailoverManager$StateChangeDeco@2eec0962 > 2011-10-20 02:58:25,635 WARN > com.cloudera.flume.agent.diskfailover.NaiveFileFailoverManager: Double close > (which is ok) > > 2011-10-20 02:58:25,635 INFO > com.cloudera.flume.handlers.hdfs.EscapedCustomDfsSink: Closing hdfs:// > master-hadoop-dfw.lindenlab.com:54310/user/cru/bench/2011-10-20/0200/foo- > 2011-10-20 02:58:25,635 INFO > com.cloudera.flume.handlers.hdfs.CustomDfsSink: Closing HDFS file: hdfs:// > master-hadoop-dfw.lindenlab.com:54310/user/cru/bench/2011-10-20/0200/foo-.snappy.tmp > 2011-10-20 02:58:25,635 INFO > com.cloudera.flume.handlers.hdfs.CustomDfsSink: done writing raw file to > hdfs > > 2011-10-20 02:58:25,882 ERROR > com.cloudera.flume.core.connector.DirectDriver: Closing down due to > exception during close calls > > 2011-10-20 02:58:25,883 INFO > com.cloudera.flume.core.connector.DirectDriver: Connector FileFailover-143 > exited with error: org.apache.hadoop.security.AccessControlException: > Permission denied: user=flume, access=WRITE, > inode="/user/cru/bench/2011-10-20/0200":cru:supergroup:drwxr-xr-x > > > org.apache.hadoop.security.AccessControlException: > org.apache.hadoop.security.AccessControlException: Permission denied: > user=flume, access=WRITE, > inode="/user/cru/bench/2011-10-20/0200":cru:supergroup:drwxr-xr-x > > > > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) > > > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) > > > at java.lang.reflect.Constructor.newInstance(Constructor.java:513) > > > at > org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:95) > > > at > org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:57) > > > at org.apache.hadoop.hdfs.DFSClient.rename(DFSClient.java:626) > > > at > org.apache.hadoop.hdfs.DistributedFileSystem.rename(DistributedFileSystem.java:237) > > > at com.cloudera.util.PathManager.close(PathManager.java:158) > > > at > com.cloudera.flume.handlers.hdfs.CustomDfsSink.close(CustomDfsSink.java:91) > > > at > com.cloudera.flume.handlers.hdfs.EscapedCustomDfsSink.close(EscapedCustomDfsSink.java:132) > > > at > com.cloudera.flume.core.EventSinkDecorator.close(EventSinkDecorator.java:67) > > > at > com.cloudera.flume.handlers.debug.InsistentOpenDecorator.close(InsistentOpenDecorator.java:175) > > > at > com.cloudera.flume.core.EventSinkDecorator.close(EventSinkDecorator.java:67) > > > at > com.cloudera.flume.core.EventSinkDecorator.close(EventSinkDecorator.java:67) > > > at > com.cloudera.flume.core.EventSinkDecorator.close(EventSinkDecorator.java:67) > > > at > com.cloudera.flume.handlers.debug.LazyOpenDecorator.close(LazyOpenDecorator.java:81) > > > at > com.cloudera.flume.core.connector.DirectDriver$PumperThread.run(DirectDriver.java:126) > > > Caused by: org.apache.hadoop.ipc.RemoteException: > org.apache.hadoop.security.AccessControlException: Permission denied: > user=flume, access=WRITE, > inode="/user/cru/bench/2011-10-20/0200":cru:supergroup:drwxr-xr-x > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:203) > > > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:184) > > > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:135) > > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5073) > > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkParentAccess(FSNamesystem.java:5042) > > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renameToInternal(FSNamesystem.java:1880) > > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renameTo(FSNamesystem.java:1855) > > > at > org.apache.hadoop.hdfs.server.namenode.NameNode.rename(NameNode.java:729) > > > at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) > > > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > > > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1434) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1430) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1428) > > > > at org.apache.hadoop.ipc.Client.call(Client.java:1107) > > > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226) > > > at $Proxy6.rename(Unknown Source) > > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > > > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > > > at java.lang.reflect.Method.invoke(Method.java:597) > > > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) > > > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) > > > at $Proxy6.rename(Unknown Source) > > > at org.apache.hadoop.hdfs.DFSClient.rename(DFSClient.java:624) > > > ... 11 more > > > 2011-10-20 02:58:25,883 WARN > com.cloudera.flume.agent.diskfailover.NaiveFileFailoverManager: Close while > in closing state, odd > > 2011-10-20 02:58:25,883 INFO > com.cloudera.flume.handlers.hdfs.EscapedCustomDfsSink: Closing hdfs:// > master-hadoop-dfw.lindenlab.com:54310/user/cru/bench/2011-10-20/0200/foo- > 2011-10-20 02:58:25,883 INFO > com.cloudera.flume.handlers.hdfs.CustomDfsSink: Closing HDFS file: hdfs:// > master-hadoop-dfw.lindenlab.com:54310/user/cru/bench/2011-10-20/0200/foo-.snappy.tmp > 2011-10-20 02:58:25,883 INFO > com.cloudera.flume.handlers.hdfs.CustomDfsSink: done writing raw file to > hdfs > > 2011-10-20 02:58:25,934 ERROR > com.cloudera.flume.core.connector.DirectDriver: Expected IDLE but timed out > in state CLOSING > > 2011-10-20 02:58:25,934 INFO > com.cloudera.flume.agent.diskfailover.DiskFailoverDeco: Attempt 1 with no > progress being made on disk failover subsink > > 2011-10-20 02:58:26,434 ERROR > com.cloudera.flume.core.connector.DirectDriver: Expected IDLE but timed out > in state CLOSING > > 2011-10-20 02:58:26,434 INFO > com.cloudera.flume.agent.diskfailover.DiskFailoverDeco: Attempt 2 with no > progress being made on disk failover subsink > > 2011-10-20 02:58:26,935 ERROR > com.cloudera.flume.core.connector.DirectDriver: Expected IDLE but timed out > in state CLOSING > > 2011-10-20 02:58:26,935 INFO > com.cloudera.flume.agent.diskfailover.DiskFailoverDeco: Attempt 3 with no > progress being made on disk failover subsink > 2011-10-20 02:58:27,435 ERROR > com.cloudera.flume.core.connector.DirectDriver: Expected IDLE but timed out > in state CLOSING > 2011-10-20 02:58:27,435 INFO > com.cloudera.flume.agent.diskfailover.DiskFailoverDeco: Attempt 4 with no > progress being made on disk failover subsink > > 2011-10-20 02:58:27,935 ERROR > com.cloudera.flume.core.connector.DirectDriver: Expected IDLE but timed out > in state CLOSING > > 2011-10-20 02:58:27,935 INFO > com.cloudera.flume.agent.diskfailover.DiskFailoverDeco: Attempt 5 with no > progress being made on disk failover subsink > 2011-10-20 02:58:28,435 ERROR > com.cloudera.flume.core.connector.DirectDriver: Expected IDLE but timed out > in state CLOSING > 2011-10-20 02:58:28,435 INFO > com.cloudera.flume.agent.diskfailover.DiskFailoverDeco: Attempt 6 with no > progress being made on disk failover subsink > > 2011-10-20 02:58:28,935 ERROR > com.cloudera.flume.core.connector.DirectDriver: Expected IDLE but timed out > in state CLOSING > 2011-10-20 02:58:28,935 INFO > com.cloudera.flume.agent.diskfailover.DiskFailoverDeco: Attempt 7 with no > progress being made on disk failover subsink > > 2011-10-20 02:58:29,436 ERROR > com.cloudera.flume.core.connector.DirectDriver: Expected IDLE but timed out > in state CLOSING > 2011-10-20 02:58:29,436 INFO > com.cloudera.flume.agent.diskfailover.DiskFailoverDeco: Attempt 8 with no > progress being made on disk failover subsink > > 2011-10-20 02:58:29,936 ERROR > com.cloudera.flume.core.connector.DirectDriver: Expected IDLE but timed out > in state CLOSING > 2011-10-20 02:58:29,936 INFO > com.cloudera.flume.agent.diskfailover.DiskFailoverDeco: Attempt 9 with no > progress being made on disk failover subsink > > 2011-10-20 02:58:30,436 ERROR > com.cloudera.flume.core.connector.DirectDriver: Expected IDLE but timed out > in state CLOSING > > 2011-10-20 02:58:30,436 INFO > com.cloudera.flume.agent.diskfailover.DiskFailoverDeco: Attempt 10 with no > progress being made on disk failover subsink > 2011-10-20 02:58:30,436 WARN > com.cloudera.flume.agent.diskfailover.DiskFailoverDeco: WAL drain thread was > not making progress, forcing close > : > : > >