[jira] Commented: (CHUKWA-369) proposed reliability mechanism

Ari Rabkin (JIRA) Wed, 02 Sep 2009 14:27:59 -0700

    [ 
https://issues.apache.org/jira/browse/CHUKWA-369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12750672#action_12750672
 ]


Ari Rabkin commented on CHUKWA-369:
-----------------------------------

- You do not need to get an acknowledgment from the same collector you sent to. 
 The "ack" is really just a confirmation that the file in question rotated OK, 
and was a sufficient length when it rotated.

- Collectors don't need to do anything special on rotation

- There's no long-running TCP connection between agent and collector.  But my 
current implementation does assume that an agent will continue to use a single 
collector until it gets an IOException.   For now, I'm not using timeouts; 
instead, it relies on getting an IOException from a down collector.  This is 
simpler, but would require modification if we started doing dynamic 
load-balancing across collectors.

> proposed reliability mechanism
> ------------------------------
>
>                 Key: CHUKWA-369
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-369
>             Project: Hadoop Chukwa
>          Issue Type: New Feature
>          Components: data collection
>    Affects Versions: 0.3.0
>            Reporter: Ari Rabkin
>            Assignee: Ari Rabkin
>             Fix For: 0.3.0
>
>         Attachments: delayedAcks.patch
>
>
> We like to say that Chukwa is a system for reliable log collection. It isn't, 
> quite, since we don't handle collector crashes.  Here's a proposed 
> reliability mechanism.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CHUKWA-369) proposed reliability mechanism

Reply via email to