I love it and also have few use-cases.

Agree with Harsha that we need a KIP. To cover edge-cases and also
clearly define the expected behavior, whether it will be implemented
in the admin client or in a tool, etc.

Gwen

On Mon, Jul 8, 2019 at 1:52 PM Harsha <ka...@harsha.io> wrote:
>
> Hi Carlos,
>            This is a really useful feature and we would like to have it as 
> well. I think high_watermark == log_start_offset is a good starting point to 
> consider but we may also have a case where the topic is empty and the clients 
> producing it may be offline so we might end up garbage collecting which is 
> still active.  Having a configurable time period when an empty topic can be 
> deleted will help in this case. Also, we should check if there are any 
> consumers still reading from topics etc..
>           It will be good to have a KIP around this and add some edge cases 
> handling.
>
> Thanks,
> Harsha
>
>
> On Sun, Jun 23, 2019, at 9:40 PM, Carlos Manuel Duclos-Vergara wrote:
> > Hi,
> > Thanks for the answer. Looking at high water mark, then the logic would be
> > to flag the partitions that have
> >
> > high_watermark == log_start_offset
> >
> > In addition, I'm thinking that having the leader fulfill that criteria is
> > enough to flag a partition, maybe check the replicas only if requested by
> > the user.
> >
> >
> > fre. 21. jun. 2019, 23:35 skrev Colin McCabe <cmcc...@apache.org>:
> >
> > > I don't think this requires a change in the protocol.  It seems like you
> > > should be able to use the high water mark to figure something out here?
> > >
> > > best,
> > > Colin
> > >
> > >
> > > On Fri, Jun 21, 2019, at 04:56, Carlos Manuel Duclos-Vergara wrote:
> > > > Hi,
> > > >
> > > > This is an ancient task, but I feel it is still current today (specially
> > > > since as somebody that deals with a Kafka cluster I know that this
> > > happens
> > > > more often than not).
> > > >
> > > > The task is about garbage collection of topics in a sort of automated
> > > way.
> > > > After some consideration I started a prototype implementation based on a
> > > > manual process:
> > > >
> > > > 1. Using the cli, I can use the --describe-topic to get a list of topics
> > > > that have size 0
> > > > 2. Massage that list into something that can be then fed into the cli 
> > > > and
> > > > remove the topics that have size 0.
> > > >
> > > > The guiding principle here is the assumption that abandoned topics will
> > > > eventually have size 0, because all records will expire. This is not 
> > > > true
> > > > for all topics, but it covers a large portion of them and having
> > > something
> > > > like this would help admins to find "suspicious" topics at least.
> > > >
> > > > I started implementing this change and I realized that it would require 
> > > > a
> > > > change in the protocol, because the sizes are never sent over the wire.
> > > > Funny enough we collect the sizes of the log files, but we do not send
> > > them.
> > > >
> > > > I think this kind of changes will require a KIP, but I wanted to ask 
> > > > what
> > > > others think about this.
> > > >
> > > > The in-progress implementation of this can be found here:
> > > >
> > > https://github.com/carlosduclos/kafka/commit/0dffe5e131c3bd32b77f56b9be8eded89a96df54
> > > >
> > > > Comments?
> > > >
> > > > --
> > > > Carlos Manuel Duclos Vergara
> > > > Backend Software Developer
> > > >
> > >
> >



-- 
Gwen Shapira
Product Manager | Confluent
650.450.2760 | @gwenshap
Follow us: Twitter | blog

Reply via email to