[
https://issues.apache.org/jira/browse/FLUME-659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ashish Paliwal resolved FLUME-659.
----------------------------------
Resolution: Won't Fix
Won't fix. 0.X branch not maintained anymore
> Agent with Thrift rpcSource closes source after receiving new config from
> master
> --------------------------------------------------------------------------------
>
> Key: FLUME-659
> URL: https://issues.apache.org/jira/browse/FLUME-659
> Project: Flume
> Issue Type: Bug
> Components: Master, Node, Sinks+Sources
> Affects Versions: v0.9.3
> Environment: Ubuntu 10.10 Maverick Meerkat
> Reporter: Disabled imported user
> Labels: rpc, thrift
>
> You can reproduce this problem by following these steps:
> Set up:
> * Master
> * Agent: rpcSource(35092) | agent*(...) # agent*Sink and agent*Chain all have
> this problem
> * Collector: collectorSource(...) | collectorSink(...)
> Start sending events to the agent using Thrift. Then use the flume shell on
> master to configure the agent -- you can even use the exact same config as
> the agent had in the first place. Make sure the agent receives this
> configuration while still being sent events. After the agent receives its
> configuration, it will close its source server for some reason and thereafter
> become unresponsive to new configurations. This is the sample output from
> the agent logs:
> 2011-06-15 07:29:04,086 INFO
> com.cloudera.flume.handlers.thrift.ThriftEventSink: ThriftEventSink on port
> 35853 closed
> 2011-06-15 07:29:05,088 INFO
> com.cloudera.flume.handlers.thrift.ThriftEventSource: Closed server on port
> 35092...
> 2011-06-15 07:29:05,088 INFO
> com.cloudera.flume.handlers.thrift.ThriftEventSource: Queue still has 4
> elements ...
> And of course, the fact that the server is closed results in lots of the
> following types of errors in the application that's sending events:
> Thrift::TransportException: Broken pipe
> Thrift::TransportException: Could not connect to localhost:35092: Connection
> refused - connect(2)
> Another variation to reproduce this type of error is to bring the master
> down, then bring it back up, at which point it will send its configuration to
> the agent node. Upon receiving the new configuration, the agent closes its
> source server and becomes unresponsive to new configurations. The following
> is output from an agent that was configured with two logical nodes, one that
> was rpcSource(35090) | agentE2EChain(...) and one that was rpcSource(35092) |
> agentBEChain(...)
> 2011-06-15 05:37:46,731 INFO com.cloudera.flume.agent.ThriftMasterRPC:
> Connected to master at flume-master:35872
> 2011-06-15 05:37:51,770 INFO
> com.cloudera.flume.handlers.thrift.ThriftEventSource: Closed server on port
> 35090...
> 2011-06-15 05:37:51,771 INFO
> com.cloudera.flume.handlers.thrift.ThriftEventSource: Queue still has 0
> elements ...
> 2011-06-15 05:37:51,787 INFO
> com.cloudera.flume.handlers.thrift.ThriftEventSink: ThriftEventSink on port
> 35853 closed
> 2011-06-15 05:37:51,868 INFO
> com.cloudera.flume.handlers.thrift.ThriftEventSource: Closed server on port
> 35090...
> 2011-06-15 05:37:51,868 INFO
> com.cloudera.flume.handlers.thrift.ThriftEventSource: Queue still has 0
> elements ...
> 2011-06-15 05:37:51,868 WARN
> com.cloudera.flume.handlers.debug.LazyOpenDecorator: Closing a lazy sink that
> was not logically opened
> 2011-06-15 05:37:51,868 INFO com.cloudera.flume.agent.LogicalNode:
> flume-agent: Connector stopped: LazyOpenSource | LazyOpenDecorator
> 2011-06-15 05:37:51,875 INFO com.cloudera.flume.agent.LogicalNode: Node
> config successfully set to com.cloudera.flume.conf.FlumeConfigData@42143753
> 2011-06-15 05:37:51,880 INFO com.cloudera.flume.agent.LogicalNode: Connector
> started: LazyOpenSource | LazyOpenDecorator
> 2011-06-15 05:37:51,881 INFO
> com.cloudera.flume.handlers.thrift.ThriftEventSource: Starting blocking
> thread pool server on port 35090...
> 2011-06-15 05:37:52,788 INFO
> com.cloudera.flume.handlers.thrift.ThriftEventSource: Closed server on port
> 35092...
> 2011-06-15 05:37:52,788 INFO
> com.cloudera.flume.handlers.thrift.ThriftEventSource: Queue still has 6
> elements ...
> I once produced an exception using this master-down/master-up procedure:
> 2011-06-15 04:50:45,543 ERROR com.cloudera.flume.core.connector.DirectDriver:
> Driving src/sink failed! LazyOpenSource | LazyOpenDecorator because
> NaiveFileWALDeco not open for append
> java.lang.IllegalStateException: NaiveFileWALDeco not open for append
> at
> com.google.common.base.Preconditions.checkState(Preconditions.java:145)
> at
> com.cloudera.flume.agent.durability.NaiveFileWALDeco.append(NaiveFileWALDeco.java:133)
> at com.cloudera.flume.core.CompositeSink.append(CompositeSink.java:61)
> at
> com.cloudera.flume.agent.AgentFailChainSink.append(AgentFailChainSink.java:103)
> at
> com.cloudera.flume.core.EventSinkDecorator.append(EventSinkDecorator.java:60)
> at
> com.cloudera.flume.handlers.debug.LazyOpenDecorator.append(LazyOpenDecorator.java:75)
> at
> com.cloudera.flume.core.connector.DirectDriver$PumperThread.run(DirectDriver.java:93)
> 2011-06-15 04:50:45,544 INFO com.cloudera.flume.agent.LogicalNode: Connector
> xxxxxxxx.internal-E2E exited with error NaiveFileWALDeco not open for append
> 2011-06-15 04:50:46,544 INFO
> com.cloudera.flume.handlers.thrift.ThriftEventSource: Closed server on port
> 35090...
> 2011-06-15 04:50:46,545 INFO
> com.cloudera.flume.handlers.thrift.ThriftEventSource: Queue still has 6
> elements ...
> 2011-06-15 04:50:50,443 INFO com.cloudera.flume.agent.AgentFailChainSink:
> Setting e2e failover chain to { ackedWriteAhead => { stubbornAppend => {
> insistentOpen => failChain(" %s
> ","tsink(\"collector1\",35853)","tsink(\"collector2\",35853)") } } }
> 2011-06-15 04:50:50,443 INFO com.cloudera.flume.agent.AgentFailChainSink:
> Setting failover chain to { ackedWriteAhead => { stubbornAppend => {
> insistentOpen => failChain(" %s
> ","tsink(\"collector2\",35853)","tsink(\"collector2\",35853)") } } }
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)