[ https://issues.apache.org/jira/browse/KAFKA-656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14102552#comment-14102552 ]
Abhinav Anand commented on KAFKA-656: ------------------------------------- Hi [~jkreps], Thanks for the inputs. I am planning to start with the basic use case of per-topic bytes-written. Few questions that would help me make the code conform to the standards. 1. As the number of partitions change, is the partition count maintained as a state variable on all broker or should we figure this out from zookeeper ? 2. The kafka server uses the Yammer metrics, though the clients have their own metrics library (with the SampledStat etc). Is there any specific reason to do so ? I would avoid changing the metric design on the broker side. Regards, Abhinav > Add Quotas to Kafka > ------------------- > > Key: KAFKA-656 > URL: https://issues.apache.org/jira/browse/KAFKA-656 > Project: Kafka > Issue Type: New Feature > Components: core > Affects Versions: 0.8.1 > Reporter: Jay Kreps > Labels: project > > It would be nice to implement a quota system in Kafka to improve our support > for highly multi-tenant usage. The goal of this system would be to prevent > one naughty user from accidently overloading the whole cluster. > There are several quantities we would want to track: > 1. Requests pers second > 2. Bytes written per second > 3. Bytes read per second > There are two reasonable groupings we would want to aggregate and enforce > these thresholds at: > 1. Topic level > 2. Client level (e.g. by client id from the request) > When a request hits one of these limits we will simply reject it with a > QUOTA_EXCEEDED exception. > To avoid suddenly breaking things without warning, we should ideally support > two thresholds: a soft threshold at which we produce some kind of warning and > a hard threshold at which we give the error. The soft threshold could just be > defined as 80% (or whatever) of the hard threshold. > There are nuances to getting this right. If you measure second-by-second a > single burst may exceed the threshold, so we need a sustained measurement > over a period of time. > Likewise when do we stop giving this error? To make this work right we likely > need to charge against the quota for request *attempts* not just successful > requests. Otherwise a client that is overloading the server will just flap on > and off--i.e. we would disable them for a period of time but when we > re-enabled them they would likely still be abusing us. > It would be good to a wiki design on how this would all work as a starting > point for discussion. -- This message was sent by Atlassian JIRA (v6.2#6252)