On Wed, Jan 15, 2020, at 03:54, Dhruvil Shah wrote:
> Hi Colin,
> 
> We could add a configuration to disable stray partition deletion if needed,
> but I wasn't sure if an operator would really want to disable it. Perhaps
> if the implementation were buggy, the configuration could be used to
> disable the feature until a bug fix is made. Is that the kind of use case
> you were thinking of?
> 
> I was thinking that there would not be any delay between detection and
> deletion of stray logs. We would schedule an async task to do the actual
> deletion though.

Based on my experience in HDFS, immediately deleting data that looks out of 
place can cause severe issues when a bug occurs.  See 
https://issues.apache.org/jira/browse/HDFS-6186 for details.  So I really do 
think there should be a delay, and a metric + log message in the meantime to 
alert the operators to what is about to happen.

best,
Colin

> 
> Thanks,
> Dhruvil
> 
> On Tue, Jan 14, 2020 at 11:04 PM Colin McCabe <cmcc...@apache.org> wrote:
> 
> > Hi Dhruvil,
> >
> > Thanks for the KIP.  I think there should be some way to turn this off, in
> > case that becomes necessary.  I'm also curious how long we intend to wait
> > between detecting the duplication and  deleting the extra logs.  The KIP
> > says "scheduled for deletion" but doesn't give a time frame -- is it
> > assumed to be immediate?
> >
> > best,
> > Colin
> >
> >
> > On Tue, Jan 14, 2020, at 05:56, Dhruvil Shah wrote:
> > > If there are no more questions or concerns, I will start a vote thread
> > > tomorrow.
> > >
> > > Thanks,
> > > Dhruvil
> > >
> > > On Mon, Jan 13, 2020 at 6:59 PM Dhruvil Shah <dhru...@confluent.io>
> > wrote:
> > >
> > > > Hi Nikhil,
> > > >
> > > > Thanks for looking at the KIP. The kind of race condition you mention
> > is
> > > > not possible as stray partition detection is done synchronously while
> > > > handling the LeaderAndIsrRequest. In other words, we atomically
> > evaluate
> > > > the partitions the broker must host and the extra partitions it is
> > hosting
> > > > and schedule deletions based on that.
> > > >
> > > > One possible shortcoming of the KIP is that we do not have the ability
> > to
> > > > detect a stray partition if the topic has been recreated since. We will
> > > > have the ability to disambiguate between different generations of a
> > > > partition with KIP-516.
> > > >
> > > > Thanks,
> > > > Dhruvil
> > > >
> > > > On Sat, Jan 11, 2020 at 11:40 AM Nikhil Bhatia <nik...@confluent.io>
> > > > wrote:
> > > >
> > > >> Thanks Dhruvil, the proposal looks reasonable to me.
> > > >>
> > > >> is there a potential of a race between a new topic being assigned to
> > the
> > > >> same node that is still performing a cleanup of the stray partition ?
> > > >> Topic
> > > >> ID will definitely solve this issue.
> > > >>
> > > >> Thanks
> > > >> Nikhil
> > > >>
> > > >> On 2020/01/06 04:30:20, Dhruvil Shah <d...@confluent.io> wrote:
> > > >> > Here is the link to the KIP:>
> > > >> >
> > > >>
> > > >>
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-550%3A+Mechanism+to+Delete+Stray+Partitions+on+Broker
> > > >> >
> > > >>
> > > >> >
> > > >> > On Mon, Jan 6, 2020 at 9:59 AM Dhruvil Shah <dh...@confluent.io>
> > > >> wrote:>
> > > >> >
> > > >> > > Hi all, I would like to kick off discussion for KIP-550 which
> > proposes
> > > >> a>
> > > >> > > mechanism to detect and delete stray partitions on a broker.
> > > >> Suggestions>
> > > >> > > and feedback are welcome.>
> > > >> > >>
> > > >> > > - Dhruvil>
> > > >> > >>
> > > >> >
> > > >>
> > > >
> > >
> >
>

Reply via email to