[ 
https://issues.apache.org/jira/browse/CHUKWA-533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Graham updated CHUKWA-533:
-------------------------------

    Attachment: CHUKWA-533-1.patch

Here's a first pass at of a patch for review. I've changed the {{rotate}} and 
{{add}} methods to be more fault-tolerant (i.e. to be able to survive a 
temporary HDFS outage). The {{init}} method still requires HDFS, so HDFS must 
be running for the collector to start. We can revisit this decision if people 
see the need.

I changed {{add}} to return {{COMMIT_FAIL}} if the chunks couldn't be added to 
the sequence file and I don't update the {{dataSize}} and {{bytesThisRotate}} 
unless the sequence file append succeeds. The {{ServletCollector}} returns a 
503 if this method returns {{COMMIT_FAIL}}.

I changed {{rotate}} to basically log and swallow the error.

I changed {{ServletCollector}} to not update stats if it gets a {{COMMIT_FAIL}} 
response.

The only issue that I see with this approach is that if the agent sends chunks 
and gets back commit pending acks  for those chunks, HDFS can still go down and 
the file will not be rotated. This is the same though as the current behavior, 
except now the collector won't die. If guaranteed writes are desired, then the 
{{AsyncAckSender}} should be used.

> Improve fault-tolerance of collectors.
> --------------------------------------
>
>                 Key: CHUKWA-533
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-533
>             Project: Chukwa
>          Issue Type: Improvement
>          Components: data collection
>            Reporter: Bill Graham
>         Attachments: CHUKWA-533-1.patch
>
>
> There are currently a number of ways that a collector can die, typically due 
> to errors on a DN or a NN that's being restarted. A collector should have 
> some combination of retry logic followed by failing back to the agent, but 
> the collector process should not die.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to