[
https://issues.apache.org/jira/browse/FLUME-706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13082858#comment-13082858
]
satish commented on FLUME-706:
------------------------------
I could fix this , but I replaced the call to loadNodeDriver() in
LogicalNodeManager.spawn to nd.checkConfig(data), I also overloaded
NodesManager.spawn() to NodesManager.spawn(String name, FlumeConfigData data),
with the above call to checkConfigData, the first data flow (which is caused by
checkLogicalNodes() in the HeartBeatThread will result in the lastGoodConfig
being set to a valid config (since checkConfig eventually calls loadConfig), so
when the second data flow which is started from dequeueCheckConfig reaches
checkConfig, the config is already valid and set, and hence it does not try to
start the driver again.
I understand that this might be a temporary fix, clearly Eric's approach to
separate the config handling to a single place in the code is a better
approach, but for immediate testing this is a smaller/faster fix right ?
> Flume nodes launch duplicate logical nodes
> ------------------------------------------
>
> Key: FLUME-706
> URL: https://issues.apache.org/jira/browse/FLUME-706
> Project: Flume
> Issue Type: Bug
> Components: Master, Node
> Affects Versions: v0.9.5
> Reporter: E. Sammer
> Assignee: E. Sammer
> Priority: Critical
> Fix For: v0.9.5
>
> Attachments: FLUME-706.log
>
>
> When submitting a config command to the flume master, it seems as if the
> downstream node attempts to load the config twice.
> In a test case, starting a single master and a single node, I submitted a
> "config node rpcSource(12345) console". The node sees the config change on
> the next heartbeat and updates its config and starts the thrift source on
> port 12345. Immediately after, it logs "Taking another heartbeat" (DEBUG) and
> attempts to create another logical node with the same config. This leads to
> thrift errors in bind() and "Could not create ServerSocket on address ...".
> Looking at the root cause in a debugger (thrift swallows the original
> exception) I can see it's an "Address already in use" IOException.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira