[
https://issues.apache.org/jira/browse/CHUKWA-369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12739607#action_12739607
]
Eric Yang commented on CHUKWA-369:
----------------------------------
There was another attempt to address this issue, and I think we could learn
something from our past attempt. The main thing that we learn was that *flush
on every write while holding the client connection yield poor performance
collector.*
LocalWriter was designed to handle this problem. Instead of writing to HDFS
directly, it writes to local file system then put the file onto HDFS. The main
idea was to disconnect the synchronization between agent, collector, and data
node while improving data reliability. When collector crashed, it will resume
to process from it's local disk.
The LocalWriter was not finished but it shows some promising idea in addressing
the reliability problem. The major flaw was that writing to local disk was
faster than writing to HDFS, and the result was frequent disk full in the
collector. The actual implementation could be improved by limiting the local
disk usage and stop receiving additional chunks if the disk queue is reaches
quota. This should improve collector reliability without using the
synchronized pipeline.
> proposed reliability mechanism
> ------------------------------
>
> Key: CHUKWA-369
> URL: https://issues.apache.org/jira/browse/CHUKWA-369
> Project: Hadoop Chukwa
> Issue Type: New Feature
> Components: data collection
> Affects Versions: 0.3.0
> Reporter: Ari Rabkin
> Fix For: 0.3.0
>
>
> We like to say that Chukwa is a system for reliable log collection. It isn't,
> quite, since we don't handle collector crashes. Here's a proposed
> reliability mechanism.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.