[jira] Commented: (CHUKWA-369) proposed reliability mechanism

Eric Yang (JIRA) Wed, 05 Aug 2009 09:59:43 -0700

    [ 
https://issues.apache.org/jira/browse/CHUKWA-369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12739607#action_12739607
 ]


Eric Yang commented on CHUKWA-369:
----------------------------------

There was another attempt to address this issue, and I think we could learn 
something from our past attempt.  The main thing that we learn was that *flush 
on every write while holding the client connection yield poor performance 
collector.*  

LocalWriter was designed to handle this problem.  Instead of writing to HDFS 
directly, it writes to local file system then put the file onto HDFS.  The main 
idea was to disconnect the synchronization between agent, collector, and data 
node while improving data reliability.  When collector crashed, it will resume 
to process from it's local disk.

The LocalWriter was not finished but it shows some promising idea in addressing 
the reliability problem.  The major flaw was that writing to local disk was 
faster than writing to HDFS, and the result was frequent disk full in the 
collector.  The actual implementation could be improved by limiting the local 
disk usage and stop receiving additional chunks if the disk queue is reaches 
quota.  This should improve collector reliability without using the 
synchronized pipeline.


> proposed reliability mechanism
> ------------------------------
>
>                 Key: CHUKWA-369
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-369
>             Project: Hadoop Chukwa
>          Issue Type: New Feature
>          Components: data collection
>    Affects Versions: 0.3.0
>            Reporter: Ari Rabkin
>             Fix For: 0.3.0
>
>
> We like to say that Chukwa is a system for reliable log collection. It isn't, 
> quite, since we don't handle collector crashes.  Here's a proposed 
> reliability mechanism.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CHUKWA-369) proposed reliability mechanism

Reply via email to