[
https://issues.apache.org/jira/browse/FLUME-706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13086552#comment-13086552
]
[email protected] commented on FLUME-706:
-----------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1467/
-----------------------------------------------------------
(Updated 2011-08-17 19:55:38.022879)
Review request for Flume, Arvind Prabhakar and Eric Sammer.
Changes
-------
Updated to clean up exception handling when a spawn fails, and with real unit
test that tests to root problem. Look at the diff between #1 and #2 to see
improved exception handling and added test case.
Summary (updated)
-------
commit 34b0ada18f38d82b8acee4c2ec1a5b6693e524ea
Author: Jonathan Hsieh <[email protected]>
Date: Wed Aug 17 10:35:22 2011 -0700
FLUME-706: Flume nodes launch duplicate logical nodes
When a logical node is being spawned for the first time we attempt to load
the config of the node. Unfortunately, we would subsequently load it
again and spawn a second driver thread because we neglected to update the
last good config version. This fixes the problem by making sure that
value gets updated on the first attempt. We also update error handling so
that a failure of signle logical node spawn only affects that node.
This addresses bug flume-706.
https://issues.apache.org/jira/browse/flume-706
Diffs (updated)
-----
flume-core/src/main/java/com/cloudera/flume/agent/FlumeNode.java b8f2b67
flume-core/src/main/java/com/cloudera/flume/agent/LivenessManager.java
c72a626
flume-core/src/main/java/com/cloudera/flume/agent/LogicalNode.java 3f64238
flume-core/src/main/java/com/cloudera/flume/agent/LogicalNodeManager.java
b3f96f2
flume-core/src/main/java/com/cloudera/flume/conf/FlumeConfigData.java 9e660cc
flume-core/src/test/java/com/cloudera/flume/agent/TestAgentCloseNoDeadlock.java
e1353b8
flume-core/src/test/java/com/cloudera/flume/agent/TestLogicalNodeManager.java
0fd4bc6
flume-core/src/test/java/com/cloudera/flume/agent/diskfailover/TestDiskFailoverBehavior.java
831eca3
flume-core/src/test/java/com/cloudera/flume/shell/TestFlumeShell.java f81b190
Diff: https://reviews.apache.org/r/1467/diff
Testing (updated)
-------
Added new test, it passes. Currently running full test suite.
Thanks,
jmhsieh
> Flume nodes launch duplicate logical nodes
> ------------------------------------------
>
> Key: FLUME-706
> URL: https://issues.apache.org/jira/browse/FLUME-706
> Project: Flume
> Issue Type: Bug
> Components: Master, Node
> Affects Versions: v0.9.5
> Reporter: E. Sammer
> Assignee: E. Sammer
> Priority: Critical
> Fix For: v0.9.5
>
> Attachments:
> 0001-FLUME-706-Flume-nodes-launch-duplicate-logical-nodes.patch, FLUME-706.log
>
>
> When submitting a config command to the flume master, it seems as if the
> downstream node attempts to load the config twice.
> In a test case, starting a single master and a single node, I submitted a
> "config node rpcSource(12345) console". The node sees the config change on
> the next heartbeat and updates its config and starts the thrift source on
> port 12345. Immediately after, it logs "Taking another heartbeat" (DEBUG) and
> attempts to create another logical node with the same config. This leads to
> thrift errors in bind() and "Could not create ServerSocket on address ...".
> Looking at the root cause in a debugger (thrift swallows the original
> exception) I can see it's an "Address already in use" IOException.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira