[
https://issues.apache.org/jira/browse/FLUME-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13185722#comment-13185722
]
[email protected] commented on FLUME-927:
-----------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3487/
-----------------------------------------------------------
Review request for Mingjie Lai and jmhsieh.
Summary
-------
When the WAL decorator starts its subsink, it waits for one second for it to be
active. If the subsink doesn't start in that interval then it goes ahead and
mark it for stop and hence making the agent idle.
The agent sinks contains retry sink which will keep trying the open till is
succeed. The WAL forcing it to close in one second makes this retry mechanism
useless and forces user to restart the agent.
The patch is to wait for the subsink to be active, only exceptions in the
subsink will abort the wait.
This addresses bug FLUME-927.
https://issues.apache.org/jira/browse/FLUME-927
Diffs
-----
flume-core/src/main/java/com/cloudera/flume/agent/durability/NaiveFileWALDeco.java
3a88ab8
flume-core/src/main/java/com/cloudera/flume/handlers/debug/DelayDecorator.java
15a9066
flume-core/src/test/java/com/cloudera/flume/agent/durability/TestNaiveFileWALDeco.java
8dd45fa
Diff: https://reviews.apache.org/r/3487/diff
Testing
-------
added new testcase. will run the full regression test suite.
Thanks,
Prasad
> A Flume agent started before collectors in E2E mode could fail to connect to
> the collector
> ------------------------------------------------------------------------------------------
>
> Key: FLUME-927
> URL: https://issues.apache.org/jira/browse/FLUME-927
> Project: Flume
> Issue Type: Bug
> Components: Sinks+Sources
> Affects Versions: v0.9.4, v0.9.5
> Reporter: Prasad Mujumdar
> Assignee: Prasad Mujumdar
> Fix For: v0.9.5
>
>
> The write ahead log (WAL) mechanism expects the agent sink to be active in 1
> second. After that, it assumes that the agent couldn't connect to collector
> and shuts it down. The AgentSink has a retry mechanism that handles network
> problems, unavailable collector etc for a configurable amount of time. The
> hardcode 1 sec timeout in WAL decorator invalidates this retry mechanism.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira