[ 
https://issues.apache.org/jira/browse/FLUME-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13185722#comment-13185722
 ] 

[email protected] commented on FLUME-927:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3487/
-----------------------------------------------------------

Review request for Mingjie Lai and jmhsieh.


Summary
-------

When the WAL decorator starts its subsink, it waits for one second for it to be 
active. If the subsink doesn't start in that interval then it goes ahead and 
mark it for stop and hence making the agent idle.
The agent sinks contains retry sink which will keep trying the open till is 
succeed. The WAL forcing it to close in one second makes this retry mechanism 
useless and forces user to restart the agent.
The patch is to wait for the subsink to be active, only exceptions in the 
subsink will abort the wait.    


This addresses bug FLUME-927.
    https://issues.apache.org/jira/browse/FLUME-927


Diffs
-----

  
flume-core/src/main/java/com/cloudera/flume/agent/durability/NaiveFileWALDeco.java
 3a88ab8 
  
flume-core/src/main/java/com/cloudera/flume/handlers/debug/DelayDecorator.java 
15a9066 
  
flume-core/src/test/java/com/cloudera/flume/agent/durability/TestNaiveFileWALDeco.java
 8dd45fa 

Diff: https://reviews.apache.org/r/3487/diff


Testing
-------

added new testcase. will run the full regression test suite.


Thanks,

Prasad


                
> A Flume agent started before collectors in E2E mode could fail to connect to 
> the collector
> ------------------------------------------------------------------------------------------
>
>                 Key: FLUME-927
>                 URL: https://issues.apache.org/jira/browse/FLUME-927
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: v0.9.4, v0.9.5
>            Reporter: Prasad Mujumdar
>            Assignee: Prasad Mujumdar
>             Fix For: v0.9.5
>
>
> The write ahead log (WAL) mechanism expects the agent sink to be active in 1 
> second. After that, it assumes that the agent couldn't connect to collector 
> and shuts it down. The AgentSink has a retry mechanism that handles network 
> problems, unavailable collector etc for a configurable amount of time. The 
> hardcode 1 sec timeout in WAL decorator invalidates this retry mechanism. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to