Hi, I'm trying to set up a simple disk failover scenario but we're currently not using an agent tier. I did something like:
exec config machine 'syslogTcp(5140)' 'collector() { diskFailover() escapedCustomDfs("hdfs://...",..) }' but when writing syslog packets to the flume node the thread dies with an IllegalArgumentException: java.lang.IllegalArgumentException: Event already had an event with attribute rolltag at com.cloudera.flume.core.EventBaseImpl.set(EventBaseImpl.java:62) at com.cloudera.flume.handlers.rolling.RollSink.synchronousAppend(RollSink.java:231) at com.cloudera.flume.handlers.rolling.RollSink$1.call(RollSink.java:183) at com.cloudera.flume.handlers.rolling.RollSink$1.call(RollSink.java:181) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) The goog pointed me to several useful threads, including this jira: https://issues.apache.org/jira/browse/FLUME-735 I tried the mask("rolltag") workaround mentioned in the Jira and changed my config to: exec config machine 'syslogTcp(5140)' 'collector() { mask("rolltag") diskFailover() escapedCustomDfs("hdfs://...",..) }' This change seemed to take care of the above error (though I'm always writing to the same HDFS file now without a rolltag) but I still can't seem to get the disk failover to work as I'd expect. I'm following the same procedure that a previous poster mentioned by doing a hadoop fs -chown -R non-flume /path/to/target to simulate an HDFS failure. I'd expect the sequence files in /tmp/flume-flume/agent/dfo* to actually get filled up with data, but they're all empty. In addition, the flume node starts a death spiral after it can't write to HDFS. Instead of the DiskFilaoverDecorator taking over, it instead just continually loops and spits the following into the log file until I kill the node process: 2011-10-20 02:58:26,434 INFO com.cloudera.flume.agent.diskfailover.DiskFailoverDeco: Attempt 2 with no progress being made on disk failover subsink 2011-10-20 02:58:26,935 ERROR com.cloudera.flume.core.connector.DirectDriver: Expected IDLE but timed out in state CLOSING : : 2011-10-20 03:02:09,642 ERROR com.cloudera.flume.core.connector.DirectDriver: Expected IDLE but timed out in state CLOSING 2011-10-20 03:02:09,642 INFO com.cloudera.flume.agent.diskfailover.DiskFailoverDeco: Attempt 448 with no progress being made on disk failover subsink 2011-10-20 03:02:09,642 WARN com.cloudera.flume.agent.diskfailover.DiskFailoverDeco: WAL drain thread was not making progress, forcing close A more complete log excerpt is below[1]. Does anyone know how I can get diskFailover() to work with this dfs sink? Many thanks, -Steve [1] 2011-10-20 02:58:25,242 INFO com.cloudera.flume.agent.diskfailover.NaiveFileFailoverManager: opening new file for 20111020-025815237-0700.2276946395587008.00000142 2011-10-20 02:58:25,242 INFO com.cloudera.flume.handlers.hdfs.SeqfileEventSink: constructed new seqfile event sink: file=/var/tmp/flume-flume/agent/ pdp20.lindenlab.com/dfo_writing/20111020-025825242-0700.2276956400117096.00000142 2011-10-20 02:58:25,243 INFO com.cloudera.flume.agent.diskfailover.DiskFailoverSource: end of file com.cloudera.flume.agent.diskfailover.NaiveFileFailoverManager$StateChangeDeco@2af4fb44 2011-10-20 02:58:25,434 INFO com.cloudera.flume.handlers.rolling.RollSink: closing RollSink 'NaiveFileFailover' 2011-10-20 02:58:25,434 INFO com.cloudera.flume.handlers.hdfs.SeqfileEventSink: closed /var/tmp/flume-flume/agent/ pdp20.lindenlab.com/dfo_writing/20111020-025825242-0700.2276956400117096.00000142 2011-10-20 02:58:25,434 INFO com.cloudera.flume.agent.diskfailover.NaiveFileFailoverManager: File lives in /var/tmp/flume-flume/agent/ pdp20.lindenlab.com/dfo_writing/20111020-025825242-0700.2276956400117096.00000142 2011-10-20 02:58:25,434 INFO com.cloudera.flume.agent.diskfailover.NaiveFileFailoverManager: opening new file for 20111020-025825242-0700.2276956400117096.00000142 2011-10-20 02:58:25,435 INFO com.cloudera.flume.agent.diskfailover.DiskFailoverSource: end of file com.cloudera.flume.agent.diskfailover.NaiveFileFailoverManager$StateChangeDeco@2eec0962 2011-10-20 02:58:25,635 WARN com.cloudera.flume.agent.diskfailover.NaiveFileFailoverManager: Double close (which is ok) 2011-10-20 02:58:25,635 INFO com.cloudera.flume.handlers.hdfs.EscapedCustomDfsSink: Closing hdfs:// master-hadoop-dfw.lindenlab.com:54310/user/cru/bench/2011-10-20/0200/foo- 2011-10-20 02:58:25,635 INFO com.cloudera.flume.handlers.hdfs.CustomDfsSink: Closing HDFS file: hdfs:// master-hadoop-dfw.lindenlab.com:54310/user/cru/bench/2011-10-20/0200/foo-.snappy.tmp 2011-10-20 02:58:25,635 INFO com.cloudera.flume.handlers.hdfs.CustomDfsSink: done writing raw file to hdfs 2011-10-20 02:58:25,882 ERROR com.cloudera.flume.core.connector.DirectDriver: Closing down due to exception during close calls 2011-10-20 02:58:25,883 INFO com.cloudera.flume.core.connector.DirectDriver: Connector FileFailover-143 exited with error: org.apache.hadoop.security.AccessControlException: Permission denied: user=flume, access=WRITE, inode="/user/cru/bench/2011-10-20/0200":cru:supergroup:drwxr-xr-x org.apache.hadoop.security.AccessControlException: org.apache.hadoop.security.AccessControlException: Permission denied: user=flume, access=WRITE, inode="/user/cru/bench/2011-10-20/0200":cru:supergroup:drwxr-xr-x at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:95) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:57) at org.apache.hadoop.hdfs.DFSClient.rename(DFSClient.java:626) at org.apache.hadoop.hdfs.DistributedFileSystem.rename(DistributedFileSystem.java:237) at com.cloudera.util.PathManager.close(PathManager.java:158) at com.cloudera.flume.handlers.hdfs.CustomDfsSink.close(CustomDfsSink.java:91) at com.cloudera.flume.handlers.hdfs.EscapedCustomDfsSink.close(EscapedCustomDfsSink.java:132) at com.cloudera.flume.core.EventSinkDecorator.close(EventSinkDecorator.java:67) at com.cloudera.flume.handlers.debug.InsistentOpenDecorator.close(InsistentOpenDecorator.java:175) at com.cloudera.flume.core.EventSinkDecorator.close(EventSinkDecorator.java:67) at com.cloudera.flume.core.EventSinkDecorator.close(EventSinkDecorator.java:67) at com.cloudera.flume.core.EventSinkDecorator.close(EventSinkDecorator.java:67) at com.cloudera.flume.handlers.debug.LazyOpenDecorator.close(LazyOpenDecorator.java:81) at com.cloudera.flume.core.connector.DirectDriver$PumperThread.run(DirectDriver.java:126) Caused by: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.security.AccessControlException: Permission denied: user=flume, access=WRITE, inode="/user/cru/bench/2011-10-20/0200":cru:supergroup:drwxr-xr-x at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:203) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:184) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:135) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5073) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkParentAccess(FSNamesystem.java:5042) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renameToInternal(FSNamesystem.java:1880) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renameTo(FSNamesystem.java:1855) at org.apache.hadoop.hdfs.server.namenode.NameNode.rename(NameNode.java:729) at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1434) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1430) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1428) at org.apache.hadoop.ipc.Client.call(Client.java:1107) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226) at $Proxy6.rename(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) at $Proxy6.rename(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.rename(DFSClient.java:624) ... 11 more 2011-10-20 02:58:25,883 WARN com.cloudera.flume.agent.diskfailover.NaiveFileFailoverManager: Close while in closing state, odd 2011-10-20 02:58:25,883 INFO com.cloudera.flume.handlers.hdfs.EscapedCustomDfsSink: Closing hdfs:// master-hadoop-dfw.lindenlab.com:54310/user/cru/bench/2011-10-20/0200/foo- 2011-10-20 02:58:25,883 INFO com.cloudera.flume.handlers.hdfs.CustomDfsSink: Closing HDFS file: hdfs:// master-hadoop-dfw.lindenlab.com:54310/user/cru/bench/2011-10-20/0200/foo-.snappy.tmp 2011-10-20 02:58:25,883 INFO com.cloudera.flume.handlers.hdfs.CustomDfsSink: done writing raw file to hdfs 2011-10-20 02:58:25,934 ERROR com.cloudera.flume.core.connector.DirectDriver: Expected IDLE but timed out in state CLOSING 2011-10-20 02:58:25,934 INFO com.cloudera.flume.agent.diskfailover.DiskFailoverDeco: Attempt 1 with no progress being made on disk failover subsink 2011-10-20 02:58:26,434 ERROR com.cloudera.flume.core.connector.DirectDriver: Expected IDLE but timed out in state CLOSING 2011-10-20 02:58:26,434 INFO com.cloudera.flume.agent.diskfailover.DiskFailoverDeco: Attempt 2 with no progress being made on disk failover subsink 2011-10-20 02:58:26,935 ERROR com.cloudera.flume.core.connector.DirectDriver: Expected IDLE but timed out in state CLOSING 2011-10-20 02:58:26,935 INFO com.cloudera.flume.agent.diskfailover.DiskFailoverDeco: Attempt 3 with no progress being made on disk failover subsink 2011-10-20 02:58:27,435 ERROR com.cloudera.flume.core.connector.DirectDriver: Expected IDLE but timed out in state CLOSING 2011-10-20 02:58:27,435 INFO com.cloudera.flume.agent.diskfailover.DiskFailoverDeco: Attempt 4 with no progress being made on disk failover subsink 2011-10-20 02:58:27,935 ERROR com.cloudera.flume.core.connector.DirectDriver: Expected IDLE but timed out in state CLOSING 2011-10-20 02:58:27,935 INFO com.cloudera.flume.agent.diskfailover.DiskFailoverDeco: Attempt 5 with no progress being made on disk failover subsink 2011-10-20 02:58:28,435 ERROR com.cloudera.flume.core.connector.DirectDriver: Expected IDLE but timed out in state CLOSING 2011-10-20 02:58:28,435 INFO com.cloudera.flume.agent.diskfailover.DiskFailoverDeco: Attempt 6 with no progress being made on disk failover subsink 2011-10-20 02:58:28,935 ERROR com.cloudera.flume.core.connector.DirectDriver: Expected IDLE but timed out in state CLOSING 2011-10-20 02:58:28,935 INFO com.cloudera.flume.agent.diskfailover.DiskFailoverDeco: Attempt 7 with no progress being made on disk failover subsink 2011-10-20 02:58:29,436 ERROR com.cloudera.flume.core.connector.DirectDriver: Expected IDLE but timed out in state CLOSING 2011-10-20 02:58:29,436 INFO com.cloudera.flume.agent.diskfailover.DiskFailoverDeco: Attempt 8 with no progress being made on disk failover subsink 2011-10-20 02:58:29,936 ERROR com.cloudera.flume.core.connector.DirectDriver: Expected IDLE but timed out in state CLOSING 2011-10-20 02:58:29,936 INFO com.cloudera.flume.agent.diskfailover.DiskFailoverDeco: Attempt 9 with no progress being made on disk failover subsink 2011-10-20 02:58:30,436 ERROR com.cloudera.flume.core.connector.DirectDriver: Expected IDLE but timed out in state CLOSING 2011-10-20 02:58:30,436 INFO com.cloudera.flume.agent.diskfailover.DiskFailoverDeco: Attempt 10 with no progress being made on disk failover subsink 2011-10-20 02:58:30,436 WARN com.cloudera.flume.agent.diskfailover.DiskFailoverDeco: WAL drain thread was not making progress, forcing close : :