On Wed, Jan 15, 2020, at 03:54, Dhruvil Shah wrote: > Hi Colin, > > We could add a configuration to disable stray partition deletion if needed, > but I wasn't sure if an operator would really want to disable it. Perhaps > if the implementation were buggy, the configuration could be used to > disable the feature until a bug fix is made. Is that the kind of use case > you were thinking of? > > I was thinking that there would not be any delay between detection and > deletion of stray logs. We would schedule an async task to do the actual > deletion though.
Based on my experience in HDFS, immediately deleting data that looks out of place can cause severe issues when a bug occurs. See https://issues.apache.org/jira/browse/HDFS-6186 for details. So I really do think there should be a delay, and a metric + log message in the meantime to alert the operators to what is about to happen. best, Colin > > Thanks, > Dhruvil > > On Tue, Jan 14, 2020 at 11:04 PM Colin McCabe <cmcc...@apache.org> wrote: > > > Hi Dhruvil, > > > > Thanks for the KIP. I think there should be some way to turn this off, in > > case that becomes necessary. I'm also curious how long we intend to wait > > between detecting the duplication and deleting the extra logs. The KIP > > says "scheduled for deletion" but doesn't give a time frame -- is it > > assumed to be immediate? > > > > best, > > Colin > > > > > > On Tue, Jan 14, 2020, at 05:56, Dhruvil Shah wrote: > > > If there are no more questions or concerns, I will start a vote thread > > > tomorrow. > > > > > > Thanks, > > > Dhruvil > > > > > > On Mon, Jan 13, 2020 at 6:59 PM Dhruvil Shah <dhru...@confluent.io> > > wrote: > > > > > > > Hi Nikhil, > > > > > > > > Thanks for looking at the KIP. The kind of race condition you mention > > is > > > > not possible as stray partition detection is done synchronously while > > > > handling the LeaderAndIsrRequest. In other words, we atomically > > evaluate > > > > the partitions the broker must host and the extra partitions it is > > hosting > > > > and schedule deletions based on that. > > > > > > > > One possible shortcoming of the KIP is that we do not have the ability > > to > > > > detect a stray partition if the topic has been recreated since. We will > > > > have the ability to disambiguate between different generations of a > > > > partition with KIP-516. > > > > > > > > Thanks, > > > > Dhruvil > > > > > > > > On Sat, Jan 11, 2020 at 11:40 AM Nikhil Bhatia <nik...@confluent.io> > > > > wrote: > > > > > > > >> Thanks Dhruvil, the proposal looks reasonable to me. > > > >> > > > >> is there a potential of a race between a new topic being assigned to > > the > > > >> same node that is still performing a cleanup of the stray partition ? > > > >> Topic > > > >> ID will definitely solve this issue. > > > >> > > > >> Thanks > > > >> Nikhil > > > >> > > > >> On 2020/01/06 04:30:20, Dhruvil Shah <d...@confluent.io> wrote: > > > >> > Here is the link to the KIP:> > > > >> > > > > >> > > > >> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-550%3A+Mechanism+to+Delete+Stray+Partitions+on+Broker > > > >> > > > > >> > > > >> > > > > >> > On Mon, Jan 6, 2020 at 9:59 AM Dhruvil Shah <dh...@confluent.io> > > > >> wrote:> > > > >> > > > > >> > > Hi all, I would like to kick off discussion for KIP-550 which > > proposes > > > >> a> > > > >> > > mechanism to detect and delete stray partitions on a broker. > > > >> Suggestions> > > > >> > > and feedback are welcome.> > > > >> > >> > > > >> > > - Dhruvil> > > > >> > >> > > > >> > > > > >> > > > > > > > > > >