[
https://issues.apache.org/jira/browse/CHUKWA-391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12757594#action_12757594
]
Ari Rabkin commented on CHUKWA-391:
-----------------------------------
In particular:
If agents write too fast, then a collector might not be able to respond to each
post before the agents timeout. When they timeout, they'll retransmit,
wastefully. Ideally, backpressure from the collector would throttle the agent
send rate.
Let n be the number of agents per collector and w be the collector write rate.
Imagine that all n agents post data at the same time. For backpressure to
work, the maximum post size needs to be small enough that the collector can
respond to each post before any of them time out. So max post size should be
less than w* timeout / n to be useful here.
Another way to think of that is that we don't do admission control for the
write queue collector, but we do at the agent. So the agent buffer should fill
up and block before the collector does.
Currently the default max post size is 2 MB, a typical collector writes at 20
MB/sec. So backpressure doesn't really work at high fan in. I think max post
size should be much smaller; maybe only a few hundred KB. This may require
some modification to the queue classes to make sure jumbo chunks work correctly.
Thoughts and comments?
> tuning timeouts and post size
> -----------------------------
>
> Key: CHUKWA-391
> URL: https://issues.apache.org/jira/browse/CHUKWA-391
> Project: Hadoop Chukwa
> Issue Type: Improvement
> Components: data collection
> Reporter: Ari Rabkin
>
> The maximum post size, HTTP post timeout, and collector fanin are all
> related. We should at least document this, and ideally autotune.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.