I love it and also have few use-cases. Agree with Harsha that we need a KIP. To cover edge-cases and also clearly define the expected behavior, whether it will be implemented in the admin client or in a tool, etc.
Gwen On Mon, Jul 8, 2019 at 1:52 PM Harsha <ka...@harsha.io> wrote: > > Hi Carlos, > This is a really useful feature and we would like to have it as > well. I think high_watermark == log_start_offset is a good starting point to > consider but we may also have a case where the topic is empty and the clients > producing it may be offline so we might end up garbage collecting which is > still active. Having a configurable time period when an empty topic can be > deleted will help in this case. Also, we should check if there are any > consumers still reading from topics etc.. > It will be good to have a KIP around this and add some edge cases > handling. > > Thanks, > Harsha > > > On Sun, Jun 23, 2019, at 9:40 PM, Carlos Manuel Duclos-Vergara wrote: > > Hi, > > Thanks for the answer. Looking at high water mark, then the logic would be > > to flag the partitions that have > > > > high_watermark == log_start_offset > > > > In addition, I'm thinking that having the leader fulfill that criteria is > > enough to flag a partition, maybe check the replicas only if requested by > > the user. > > > > > > fre. 21. jun. 2019, 23:35 skrev Colin McCabe <cmcc...@apache.org>: > > > > > I don't think this requires a change in the protocol. It seems like you > > > should be able to use the high water mark to figure something out here? > > > > > > best, > > > Colin > > > > > > > > > On Fri, Jun 21, 2019, at 04:56, Carlos Manuel Duclos-Vergara wrote: > > > > Hi, > > > > > > > > This is an ancient task, but I feel it is still current today (specially > > > > since as somebody that deals with a Kafka cluster I know that this > > > happens > > > > more often than not). > > > > > > > > The task is about garbage collection of topics in a sort of automated > > > way. > > > > After some consideration I started a prototype implementation based on a > > > > manual process: > > > > > > > > 1. Using the cli, I can use the --describe-topic to get a list of topics > > > > that have size 0 > > > > 2. Massage that list into something that can be then fed into the cli > > > > and > > > > remove the topics that have size 0. > > > > > > > > The guiding principle here is the assumption that abandoned topics will > > > > eventually have size 0, because all records will expire. This is not > > > > true > > > > for all topics, but it covers a large portion of them and having > > > something > > > > like this would help admins to find "suspicious" topics at least. > > > > > > > > I started implementing this change and I realized that it would require > > > > a > > > > change in the protocol, because the sizes are never sent over the wire. > > > > Funny enough we collect the sizes of the log files, but we do not send > > > them. > > > > > > > > I think this kind of changes will require a KIP, but I wanted to ask > > > > what > > > > others think about this. > > > > > > > > The in-progress implementation of this can be found here: > > > > > > > https://github.com/carlosduclos/kafka/commit/0dffe5e131c3bd32b77f56b9be8eded89a96df54 > > > > > > > > Comments? > > > > > > > > -- > > > > Carlos Manuel Duclos Vergara > > > > Backend Software Developer > > > > > > > > > -- Gwen Shapira Product Manager | Confluent 650.450.2760 | @gwenshap Follow us: Twitter | blog