Bump.  Anyone know how to get diskFailover() to work with a collector dfs
sink?  Am I just going about this the wrong way?  Thanks,

-Steve

On Thu, Oct 20, 2011 at 3:19 AM, Stephen Layland
<stephen.layl...@gmail.com>wrote:

> Hi,
>
> I'm trying to set up a simple disk failover scenario but we're currently
> not using an agent tier.  I did something like:
>
>   exec config machine 'syslogTcp(5140)' 'collector() { diskFailover()
> escapedCustomDfs("hdfs://...",..) }'
>
> but when writing syslog packets to the flume node the thread dies with an
> IllegalArgumentException:
>
> java.lang.IllegalArgumentException: Event already had an event with
> attribute rolltag
>         at com.cloudera.flume.core.EventBaseImpl.set(EventBaseImpl.java:62)
>         at
> com.cloudera.flume.handlers.rolling.RollSink.synchronousAppend(RollSink.java:231)
>         at
> com.cloudera.flume.handlers.rolling.RollSink$1.call(RollSink.java:183)
>         at
> com.cloudera.flume.handlers.rolling.RollSink$1.call(RollSink.java:181)
>         at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
>
> The goog pointed me to several useful threads, including this jira:
>
> https://issues.apache.org/jira/browse/FLUME-735
>
> I tried the mask("rolltag") workaround mentioned in the Jira and changed my
> config to:
>
>   exec config machine 'syslogTcp(5140)' 'collector() { mask("rolltag")
> diskFailover() escapedCustomDfs("hdfs://...",..) }'
>
> This change seemed to take care of the above error (though I'm always
> writing to the same HDFS file now without a rolltag) but I still can't seem
> to get the disk failover to work as I'd expect.
>
> I'm following the same procedure that a previous poster mentioned by doing
> a hadoop fs -chown -R non-flume /path/to/target to simulate an HDFS failure.
>  I'd expect the sequence files in /tmp/flume-flume/agent/dfo* to actually
> get filled up with data, but they're all empty.  In addition, the flume node
> starts a death spiral after it can't write to HDFS.  Instead of the
> DiskFilaoverDecorator taking over, it instead just continually loops and
> spits the following into the log file until I kill the node process:
>
> 2011-10-20 02:58:26,434 INFO
> com.cloudera.flume.agent.diskfailover.DiskFailoverDeco: Attempt 2 with no
> progress being made on disk failover subsink
>
> 2011-10-20 02:58:26,935 ERROR
> com.cloudera.flume.core.connector.DirectDriver: Expected IDLE but timed out
> in state CLOSING
>    :
>    :
> 2011-10-20 03:02:09,642 ERROR
> com.cloudera.flume.core.connector.DirectDriver: Expected IDLE but timed out
> in state CLOSING
> 2011-10-20 03:02:09,642 INFO
> com.cloudera.flume.agent.diskfailover.DiskFailoverDeco: Attempt 448 with no
> progress being made on disk failover subsink
> 2011-10-20 03:02:09,642 WARN
> com.cloudera.flume.agent.diskfailover.DiskFailoverDeco: WAL drain thread was
> not making progress, forcing close
>
> A more complete log excerpt is below[1].  Does anyone know how I can get
> diskFailover() to work with this dfs sink?
>
> Many thanks,
>
> -Steve
>
> [1]
> 2011-10-20 02:58:25,242 INFO
> com.cloudera.flume.agent.diskfailover.NaiveFileFailoverManager: opening new
> file for 20111020-025815237-0700.2276946395587008.00000142
>
> 2011-10-20 02:58:25,242 INFO
> com.cloudera.flume.handlers.hdfs.SeqfileEventSink: constructed new seqfile
> event sink: file=/var/tmp/flume-flume/agent/
> pdp20.lindenlab.com/dfo_writing/20111020-025825242-0700.2276956400117096.00000142
>
>
> 2011-10-20 02:58:25,243 INFO
> com.cloudera.flume.agent.diskfailover.DiskFailoverSource: end of file
> com.cloudera.flume.agent.diskfailover.NaiveFileFailoverManager$StateChangeDeco@2af4fb44
> 2011-10-20 02:58:25,434 INFO com.cloudera.flume.handlers.rolling.RollSink:
> closing RollSink 'NaiveFileFailover'
>
> 2011-10-20 02:58:25,434 INFO
> com.cloudera.flume.handlers.hdfs.SeqfileEventSink: closed
> /var/tmp/flume-flume/agent/
> pdp20.lindenlab.com/dfo_writing/20111020-025825242-0700.2276956400117096.00000142
> 2011-10-20 02:58:25,434 INFO
> com.cloudera.flume.agent.diskfailover.NaiveFileFailoverManager: File lives
> in /var/tmp/flume-flume/agent/
> pdp20.lindenlab.com/dfo_writing/20111020-025825242-0700.2276956400117096.00000142
>
>
> 2011-10-20 02:58:25,434 INFO
> com.cloudera.flume.agent.diskfailover.NaiveFileFailoverManager: opening new
> file for 20111020-025825242-0700.2276956400117096.00000142
>
> 2011-10-20 02:58:25,435 INFO
> com.cloudera.flume.agent.diskfailover.DiskFailoverSource: end of file
> com.cloudera.flume.agent.diskfailover.NaiveFileFailoverManager$StateChangeDeco@2eec0962
> 2011-10-20 02:58:25,635 WARN
> com.cloudera.flume.agent.diskfailover.NaiveFileFailoverManager: Double close
> (which is ok)
>
> 2011-10-20 02:58:25,635 INFO
> com.cloudera.flume.handlers.hdfs.EscapedCustomDfsSink: Closing hdfs://
> master-hadoop-dfw.lindenlab.com:54310/user/cru/bench/2011-10-20/0200/foo-
> 2011-10-20 02:58:25,635 INFO
> com.cloudera.flume.handlers.hdfs.CustomDfsSink: Closing HDFS file: hdfs://
> master-hadoop-dfw.lindenlab.com:54310/user/cru/bench/2011-10-20/0200/foo-.snappy.tmp
> 2011-10-20 02:58:25,635 INFO
> com.cloudera.flume.handlers.hdfs.CustomDfsSink: done writing raw file to
> hdfs
>
> 2011-10-20 02:58:25,882 ERROR
> com.cloudera.flume.core.connector.DirectDriver: Closing down due to
> exception during close calls
>
> 2011-10-20 02:58:25,883 INFO
> com.cloudera.flume.core.connector.DirectDriver: Connector FileFailover-143
> exited with error: org.apache.hadoop.security.AccessControlException:
> Permission denied: user=flume, access=WRITE,
> inode="/user/cru/bench/2011-10-20/0200":cru:supergroup:drwxr-xr-x
>
>
> org.apache.hadoop.security.AccessControlException:
> org.apache.hadoop.security.AccessControlException: Permission denied:
> user=flume, access=WRITE,
> inode="/user/cru/bench/2011-10-20/0200":cru:supergroup:drwxr-xr-x
>
>
>
>     at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
>
>     at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>
>
>     at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>
>
>     at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>
>
>     at
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:95)
>
>
>     at
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:57)
>
>
>     at org.apache.hadoop.hdfs.DFSClient.rename(DFSClient.java:626)
>
>
>     at
> org.apache.hadoop.hdfs.DistributedFileSystem.rename(DistributedFileSystem.java:237)
>
>
>     at com.cloudera.util.PathManager.close(PathManager.java:158)
>
>
>     at
> com.cloudera.flume.handlers.hdfs.CustomDfsSink.close(CustomDfsSink.java:91)
>
>
>     at
> com.cloudera.flume.handlers.hdfs.EscapedCustomDfsSink.close(EscapedCustomDfsSink.java:132)
>
>
>     at
> com.cloudera.flume.core.EventSinkDecorator.close(EventSinkDecorator.java:67)
>
>
>     at
> com.cloudera.flume.handlers.debug.InsistentOpenDecorator.close(InsistentOpenDecorator.java:175)
>
>
>     at
> com.cloudera.flume.core.EventSinkDecorator.close(EventSinkDecorator.java:67)
>
>
>     at
> com.cloudera.flume.core.EventSinkDecorator.close(EventSinkDecorator.java:67)
>
>
>     at
> com.cloudera.flume.core.EventSinkDecorator.close(EventSinkDecorator.java:67)
>
>
>     at
> com.cloudera.flume.handlers.debug.LazyOpenDecorator.close(LazyOpenDecorator.java:81)
>
>
>     at
> com.cloudera.flume.core.connector.DirectDriver$PumperThread.run(DirectDriver.java:126)
>
>
> Caused by: org.apache.hadoop.ipc.RemoteException:
> org.apache.hadoop.security.AccessControlException: Permission denied:
> user=flume, access=WRITE,
> inode="/user/cru/bench/2011-10-20/0200":cru:supergroup:drwxr-xr-x
>     at
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:203)
>
>
>     at
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:184)
>
>
>     at
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:135)
>
>
>     at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5073)
>
>
>     at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkParentAccess(FSNamesystem.java:5042)
>
>
>     at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renameToInternal(FSNamesystem.java:1880)
>
>
>     at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renameTo(FSNamesystem.java:1855)
>
>
>     at
> org.apache.hadoop.hdfs.server.namenode.NameNode.rename(NameNode.java:729)
>
>
>     at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source)
>
>
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>
>
>     at java.lang.reflect.Method.invoke(Method.java:597)
>     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1434)
>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1430)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:396)
>     at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
>     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1428)
>
>
>
>     at org.apache.hadoop.ipc.Client.call(Client.java:1107)
>
>
>     at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226)
>
>
>     at $Proxy6.rename(Unknown Source)
>
>
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
>
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>
>
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>
>
>     at java.lang.reflect.Method.invoke(Method.java:597)
>
>
>     at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>
>
>     at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>
>
>     at $Proxy6.rename(Unknown Source)
>
>
>     at org.apache.hadoop.hdfs.DFSClient.rename(DFSClient.java:624)
>
>
>     ... 11 more
>
>
> 2011-10-20 02:58:25,883 WARN
> com.cloudera.flume.agent.diskfailover.NaiveFileFailoverManager: Close while
> in closing state, odd
>
> 2011-10-20 02:58:25,883 INFO
> com.cloudera.flume.handlers.hdfs.EscapedCustomDfsSink: Closing hdfs://
> master-hadoop-dfw.lindenlab.com:54310/user/cru/bench/2011-10-20/0200/foo-
> 2011-10-20 02:58:25,883 INFO
> com.cloudera.flume.handlers.hdfs.CustomDfsSink: Closing HDFS file: hdfs://
> master-hadoop-dfw.lindenlab.com:54310/user/cru/bench/2011-10-20/0200/foo-.snappy.tmp
> 2011-10-20 02:58:25,883 INFO
> com.cloudera.flume.handlers.hdfs.CustomDfsSink: done writing raw file to
> hdfs
>
> 2011-10-20 02:58:25,934 ERROR
> com.cloudera.flume.core.connector.DirectDriver: Expected IDLE but timed out
> in state CLOSING
>
> 2011-10-20 02:58:25,934 INFO
> com.cloudera.flume.agent.diskfailover.DiskFailoverDeco: Attempt 1 with no
> progress being made on disk failover subsink
>
> 2011-10-20 02:58:26,434 ERROR
> com.cloudera.flume.core.connector.DirectDriver: Expected IDLE but timed out
> in state CLOSING
>
> 2011-10-20 02:58:26,434 INFO
> com.cloudera.flume.agent.diskfailover.DiskFailoverDeco: Attempt 2 with no
> progress being made on disk failover subsink
>
> 2011-10-20 02:58:26,935 ERROR
> com.cloudera.flume.core.connector.DirectDriver: Expected IDLE but timed out
> in state CLOSING
>
> 2011-10-20 02:58:26,935 INFO
> com.cloudera.flume.agent.diskfailover.DiskFailoverDeco: Attempt 3 with no
> progress being made on disk failover subsink
>                       2011-10-20 02:58:27,435 ERROR
> com.cloudera.flume.core.connector.DirectDriver: Expected IDLE but timed out
> in state CLOSING
>                    2011-10-20 02:58:27,435 INFO
> com.cloudera.flume.agent.diskfailover.DiskFailoverDeco: Attempt 4 with no
> progress being made on disk failover subsink
>
> 2011-10-20 02:58:27,935 ERROR
> com.cloudera.flume.core.connector.DirectDriver: Expected IDLE but timed out
> in state CLOSING
>
> 2011-10-20 02:58:27,935 INFO
> com.cloudera.flume.agent.diskfailover.DiskFailoverDeco: Attempt 5 with no
> progress being made on disk failover subsink
>                       2011-10-20 02:58:28,435 ERROR
> com.cloudera.flume.core.connector.DirectDriver: Expected IDLE but timed out
> in state CLOSING
>                    2011-10-20 02:58:28,435 INFO
> com.cloudera.flume.agent.diskfailover.DiskFailoverDeco: Attempt 6 with no
> progress being made on disk failover subsink
>
> 2011-10-20 02:58:28,935 ERROR
> com.cloudera.flume.core.connector.DirectDriver: Expected IDLE but timed out
> in state CLOSING
>                    2011-10-20 02:58:28,935 INFO
> com.cloudera.flume.agent.diskfailover.DiskFailoverDeco: Attempt 7 with no
> progress being made on disk failover subsink
>
> 2011-10-20 02:58:29,436 ERROR
> com.cloudera.flume.core.connector.DirectDriver: Expected IDLE but timed out
> in state CLOSING
>                    2011-10-20 02:58:29,436 INFO
> com.cloudera.flume.agent.diskfailover.DiskFailoverDeco: Attempt 8 with no
> progress being made on disk failover subsink
>
> 2011-10-20 02:58:29,936 ERROR
> com.cloudera.flume.core.connector.DirectDriver: Expected IDLE but timed out
> in state CLOSING
>                    2011-10-20 02:58:29,936 INFO
> com.cloudera.flume.agent.diskfailover.DiskFailoverDeco: Attempt 9 with no
> progress being made on disk failover subsink
>
> 2011-10-20 02:58:30,436 ERROR
> com.cloudera.flume.core.connector.DirectDriver: Expected IDLE but timed out
> in state CLOSING
>
> 2011-10-20 02:58:30,436 INFO
> com.cloudera.flume.agent.diskfailover.DiskFailoverDeco: Attempt 10 with no
> progress being made on disk failover subsink
>                      2011-10-20 02:58:30,436 WARN
> com.cloudera.flume.agent.diskfailover.DiskFailoverDeco: WAL drain thread was
> not making progress, forcing close
>    :
>    :
>
>

Reply via email to