[ 
https://issues.apache.org/jira/browse/CHUKWA-30?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12709458#action_12709458
 ] 

Jerome Boulon commented on CHUKWA-30:
-------------------------------------

Unit testing the collector is difficult since we don't have a end-to-end 
testing tools but this is something we are going to work on.
That been said, this code is running for one week now collecting System metrics 
from 3700 machines.

What do you mean by "add the necessary/optional conf options to 
chukwa-collector-conf.xml.template"?
Activate the new Writer in place of the current one or just add all properties 
but comment the xml block?

We had to remove the 10 seconds lock (hdfs flush) for performance reason.
Then the reason of writing to local first is because local file system tend to 
be more reliable than writing cross network and because we have a use case 
where people want to use the DataCollection pipeline but without HDFS at all.

This give me a 10X improvement compare to the default writer. In order to 
collect System Metrics from 3700 machines I had to have 5 collectors running 
and data was still late.
With the new collector with only one instance running, I've been able to handle 
all SM for all machines from a single collector.
Also, Demux is more efficient since I have fewer and bigger dataSink files.



> Remove HDFS flush & connection holding (Collector)
> --------------------------------------------------
>
>                 Key: CHUKWA-30
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-30
>             Project: Hadoop Chukwa
>          Issue Type: Improvement
>          Components: data collection
>            Reporter: Jerome Boulon
>            Assignee: Jerome Boulon
>         Attachments: CHUKWA-30.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to