Hey David, Thanks for updating the wiki.
1. I was actually thinking of letting every broker just consume the __consumer_offsets topic. But it seems less efficient if there are only a few topics configured for committed offsets based retention. So querying the committed offsets seems reasonable. From the wiki it is not clear whether the committed offset query happens sync or async. It is probably better to do this asynchronously, i.e. in another thread other than the log deleting thread. Otherwise querying the committed offsets may slow down or even potentially block the log deletion due to a remote call failure. 2. Using new consumer does not necessarily introduce a new group unless we use Kafka based group management. But using KafkaConsumer directly to query the committed offsets may not work in this case because by default it uses the consumer group in the ConsumerConfig. We can use NetworkClient and see if we can reuse some of the code in the new consumer. Since there has been a lot of efforts spent on deprecating the SimpleConsumer, we probably want to avoid introducing any new usage. Anyway, this is implementation detail and we can figure that out when writing the patch. 3. What I am thinking is that we want to consider whether we will allow multiple policies to be set at the same time? If we do allow that, which one of the policies will take precedence. Otherwise it might be confusing for the users if they have multiple retention policies set. In addition to the above, it seems that we need some way to configure the set of consumer groups a topic should be listening on? If it is through topic config, it would be good to document the configuration name and format of value in the wiki as well. Thanks, Jiangjie (Becket) Qin On Sun, Oct 9, 2016 at 7:14 AM, 东方甲乙 <254479...@qq.com> wrote: > Hi Becket, > This is david, thanks for the comments. I have update some info in > the wiki. All the changes is nearly described in the workflow. > Answer for the commnets: > 1. Every brokers only have some of the groups' commit offset which are > storaged in the __comsumer_offsets topics, it still have to query other > coordinator(other brokers) for some group's commit offset. > So we use the OffsetFetchRequest to query one group's commit offset. > > > 2. If using new consumer to query the commit offset will introduce new > group, but if we use the OffsetFetchRequest to query (like the > consumer-offset-checker tool, first find the coordinator and build an > channel to query), we will not introduce new group. > > > 3. I think the KIP-47's functionality seems a little different from this > KIP, though we are all modifying the log retention. > > > Thanks, > David. > > > > > > > > > ------------------ 原始邮件 ------------------ > 发件人: "Becket Qin";<becket....@gmail.com>; > 发送时间: 2016年10月9日(星期天) 中午1:00 > 收件人: "dev"<dev@kafka.apache.org>; > > 主题: Re: [DISCUSS] KIP-68 Add a consumed log retention before log retention > > > > Hi David, > > Thanks for the explanation. Could you update the KIP-68 wiki to include the > changes that need to be made? > > I have a few more comments below: > > 1. We already have an internal topic __consumer_offsets to store all the > committed offsets. So the brokers can probably just consume from that to > get the committed offsets for all the partitions of each group. > > 2. It is probably better to use o.a.k.clients.consumer.KafkaConsumer > instead of SimpleConsumer. It handles all the leader movements and > potential failures. > > 3. KIP-47 also has a proposal for a new time based log retention policy and > propose a new configuration on log retention. It may be worth thinking > about the behavior together. > > Thanks, > > Jiangjie (Becket) Qin > > On Sat, Oct 8, 2016 at 2:15 AM, Pengwei (L) <pengwei...@huawei.com> wrote: > > > Hi Becket, > > > > Thanks for the feedback: > > 1. We use the simple consumer api to query the commit offset, so we > don't > > need to specify the consumer group. > > 2. Every broker using the simple consumer api(OffsetFetchKey) to query > > the commit offset in the log retention process. The client can commit > > offset or not. > > 3. It does not need to distinguish the follower brokers or leader > > brokers, every brokers can query. > > 4. We don't need to change the protocols, we mainly change the log > > retention process in the log manager. > > > > One question is the query min offset need O(partitions * groups) time > > complexity, another alternative is to build an internal topic to save > every > > partition's min offset, it can reduce to O(1). > > I will update the wiki for more details. > > > > Thanks, > > David > > > > > > > Hi Pengwei, > > > > > > Thanks for the KIP proposal. It is a very useful KIP. At a high level, > > the > > > proposed behavior looks reasonable to me. > > > > > > However, it seems that some of the details are not mentioned in the > KIP. > > > For example, > > > > > > 1. How will the expected consumer group be specified? Is it through a > per > > > topic dynamic configuration? > > > 2. How do the brokers detect the consumer offsets? Is it required for a > > > consumer to commit offsets? > > > 3. How do all the replicas know the about the committed offsets? e.g. > 1) > > > non-coordinator brokers which do not have the committed offsets, 2) > > > follower brokers which do not have consumers directly consuming from > it. > > > 4. Is there any other changes need to be made (e.g. new protocols) in > > > addition to the configuration change? > > > > > > It would be great if you can update the wiki to have more details. > > > > > > Thanks, > > > > > > Jiangjie (Becket) Qin > > > > > > On Wed, Sep 7, 2016 at 2:26 AM, Pengwei (L) <pengwei...@huawei.com> > > wrote: > > > > > > > Hi All, > > > > I have made a KIP to enhance the log retention, details as > follows: > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP- > > > > 68+Add+a+consumed+log+retention+before+log+retention > > > > Now start a discuss thread for this KIP , looking forward to the > > > > feedback. > > > > > > > > Thanks, > > > > David > > > > > > > > > > >