[ 
https://issues.apache.org/jira/browse/CHUKWA-391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12757594#action_12757594
 ] 

Ari Rabkin commented on CHUKWA-391:
-----------------------------------

In particular:

If agents write too fast, then a collector might not be able to respond to each 
post before the agents timeout. When they timeout, they'll retransmit, 
wastefully.  Ideally, backpressure from the collector would throttle the agent 
send rate.

Let n be the number of agents per collector and w be the collector write rate.  
Imagine that all n agents post data at the same time.  For backpressure to 
work, the maximum post size needs to be small enough that the collector can 
respond to each post before any of them time out. So max post size should be 
less than w* timeout / n to be useful here.   

Another way to think of that is that we don't do admission control for the 
write queue collector, but we do at the agent.  So the agent buffer should fill 
up and block before the collector does.

Currently the default max post size is 2 MB, a typical collector writes at 20 
MB/sec. So backpressure doesn't really work at high fan in.  I think max post 
size should be much smaller; maybe only a few hundred KB.  This may require 
some modification to the queue classes to make sure jumbo chunks work correctly.

Thoughts and comments?

> tuning timeouts and post size
> -----------------------------
>
>                 Key: CHUKWA-391
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-391
>             Project: Hadoop Chukwa
>          Issue Type: Improvement
>          Components: data collection
>            Reporter: Ari Rabkin
>
> The maximum post size, HTTP post timeout, and collector fanin are all 
> related.  We should at least document this, and ideally autotune.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to