[ 
https://issues.apache.org/jira/browse/FLUME-706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13086552#comment-13086552
 ] 

[email protected] commented on FLUME-706:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1467/
-----------------------------------------------------------

(Updated 2011-08-17 19:55:38.022879)


Review request for Flume, Arvind Prabhakar and Eric Sammer.


Changes
-------

Updated to clean up exception handling when a spawn fails, and with real unit 
test that tests to root problem.  Look at the diff between #1 and #2 to see 
improved exception handling and added test case.


Summary (updated)
-------

commit 34b0ada18f38d82b8acee4c2ec1a5b6693e524ea
Author: Jonathan Hsieh <[email protected]>
Date:   Wed Aug 17 10:35:22 2011 -0700

    FLUME-706: Flume nodes launch duplicate logical nodes
    
    When a logical node is being spawned for the first time we attempt to load 
the config of the node.  Unfortunately, we would subsequently load it
    again and spawn a second driver thread because we neglected to update the 
last good config version.  This fixes the problem by making sure that
    value gets updated on the first attempt.   We also update error handling so 
that a failure of signle logical node spawn only affects that node.


This addresses bug flume-706.
    https://issues.apache.org/jira/browse/flume-706


Diffs (updated)
-----

  flume-core/src/main/java/com/cloudera/flume/agent/FlumeNode.java b8f2b67 
  flume-core/src/main/java/com/cloudera/flume/agent/LivenessManager.java 
c72a626 
  flume-core/src/main/java/com/cloudera/flume/agent/LogicalNode.java 3f64238 
  flume-core/src/main/java/com/cloudera/flume/agent/LogicalNodeManager.java 
b3f96f2 
  flume-core/src/main/java/com/cloudera/flume/conf/FlumeConfigData.java 9e660cc 
  
flume-core/src/test/java/com/cloudera/flume/agent/TestAgentCloseNoDeadlock.java 
e1353b8 
  flume-core/src/test/java/com/cloudera/flume/agent/TestLogicalNodeManager.java 
0fd4bc6 
  
flume-core/src/test/java/com/cloudera/flume/agent/diskfailover/TestDiskFailoverBehavior.java
 831eca3 
  flume-core/src/test/java/com/cloudera/flume/shell/TestFlumeShell.java f81b190 

Diff: https://reviews.apache.org/r/1467/diff


Testing (updated)
-------

Added new test, it passes.  Currently running full test suite.


Thanks,

jmhsieh



> Flume nodes launch duplicate logical nodes
> ------------------------------------------
>
>                 Key: FLUME-706
>                 URL: https://issues.apache.org/jira/browse/FLUME-706
>             Project: Flume
>          Issue Type: Bug
>          Components: Master, Node
>    Affects Versions: v0.9.5
>            Reporter: E. Sammer
>            Assignee: E. Sammer
>            Priority: Critical
>             Fix For: v0.9.5
>
>         Attachments: 
> 0001-FLUME-706-Flume-nodes-launch-duplicate-logical-nodes.patch, FLUME-706.log
>
>
> When submitting a config command to the flume master, it seems as if the 
> downstream node attempts to load the config twice.
> In a test case, starting a single master and a single node, I submitted a 
> "config node rpcSource(12345) console". The node sees the config change on 
> the next heartbeat and updates its config and starts the thrift source on 
> port 12345. Immediately after, it logs "Taking another heartbeat" (DEBUG) and 
> attempts to create another logical node with the same config. This leads to 
> thrift errors in bind() and "Could not create ServerSocket on address ...". 
> Looking at the root cause in a debugger (thrift swallows the original 
> exception) I can see it's an "Address already in use" IOException.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to