+1 (non-binding) On Tue, Nov 29, 2016 at 8:08 AM, <gharatmayures...@gmail.com> wrote:
> +1 (non-binding) > > Thanks, > > Mayuresh > > > > On Nov 29, 2016, at 3:18 AM, Michael Pearce <michael.pea...@ig.com> > wrote: > > > > Hi All, > > > > We have been discussing in the below thread and final changes have been > made to the KIP wiki based on these discussions. > > > > We would now like to put to the vote the following KIP: > > https://cwiki.apache.org/confluence/display/KAFKA/KIP- > 87+-+Add+Compaction+Tombstone+Flag > > > > This kip is for having a distinct compaction attribute “tombstone” flag > instead of relying on null value, allowing non-null value delete messages. > > > > Many thanks, > > Michael > > > > > > > > On 22/11/2016, 15:52, "Michael Pearce" <michael.pea...@ig.com> wrote: > > > > Hi Mayuresh, > > > > LGTM. Ive just made one small adjustment updating the wire protocol > to show the magic byte bump. > > > > Do we think we’re good to put to a vote? Is there any other bits > needing discussion? > > > > Cheers > > Mike > > > > On 21/11/2016, 18:26, "Mayuresh Gharat" <gharatmayures...@gmail.com> > wrote: > > > > Hi Michael, > > > > I have updated the migration section of the KIP. Can you please > take a look? > > > > Thanks, > > > > Mayuresh > > > > On Fri, Nov 18, 2016 at 9:07 AM, Mayuresh Gharat < > gharatmayures...@gmail.com > >> wrote: > > > >> Hi Michael, > >> > >> That whilst sending tombstone and non null value, the consumer can > expect > >> only to receive the non-null message only in step (3) is this correct? > >> ---> I do agree with you here. > >> > >> Becket, Ismael : can you guys review the migration plan listed above > using > >> magic byte? > >> > >> Thanks, > >> > >> Mayuresh > >> > >> On Fri, Nov 18, 2016 at 8:58 AM, Michael Pearce <michael.pea...@ig.com> > >> wrote: > >> > >>> Many thanks for this Mayuresh. I don't have any objections. > >>> > >>> I assume we should state: > >>> > >>> That whilst sending tombstone and non null value, the consumer can > expect > >>> only to receive the non-null message only in step (3) is this correct? > >>> > >>> Cheers > >>> Mike > >>> > >>> > >>> > >>> Sent using OWA for iPhone > >>> ________________________________________ > >>> From: Mayuresh Gharat <gharatmayures...@gmail.com> > >>> Sent: Thursday, November 17, 2016 5:18:41 PM > >>> To: dev@kafka.apache.org > >>> Subject: Re: [DISCUSS] KIP-87 - Add Compaction Tombstone Flag > >>> > >>> Hi Ismael, > >>> > >>> Thanks for the explanation. > >>> Specially I like this part where in you mentioned we can get rid of the > >>> older null value support for log compaction later on, here : > >>> We can't change semantics of the message format without having a long > >>> transition period. And we can't rely > >>> on people reading documentation or acting on a warning for something so > >>> fundamental. As such, my take is that we need to bump the magic byte. > The > >>> good news is > >>> that we don't have to support all versions forever. We have said that > we > >>> will support direct upgrades for 2 years. That means that message > format > >>> version n could, in theory, be removed 2 years after the it's > introduced. > >>> > >>> Just a heads up, I would like to mention that even without bumping > magic > >>> byte, we will *NOT* loose zero copy as in the client(x+1) in my > >>> explanation > >>> above will convert internally a null value to have a tombstone bit set > and > >>> a tombstone bit set to have a null value automatically internally and > by > >>> the time we move to version (x+2), the clients would have upgraded. > >>> Obviously if we support a request from consumer(x), we will loose zero > >>> copy > >>> but that is the same case with magic byte. > >>> > >>> But if magic byte bump makes life easier for transition for the above > >>> reasons that you explained, I am OK with it since we are going to meet > the > >>> end goal down the road :) > >>> > >>> On a side note can we update the doc here on magic byte to say that > "*it > >>> should be bumped whenever the message format is changed or the > >>> interpretation of message format (usage of the reserved bits as well) > is > >>> changed*". > >>> > >>> > >>> Hi Michael, > >>> > >>> Here is the update plan that we discussed offline yesterday : > >>> > >>> Currently the magic-byte which corresponds to the > "message.format.version" > >>> is set to 1. > >>> > >>> 1) On broker it will be set to 1 initially. > >>> > >>> 2) When a producer client sends a message with magic-byte = 2, since > the > >>> broker is on magic-byte = 1, we will down convert it, which means if > the > >>> tombstone bit is set, the value will be set to null. A consumer > >>> understanding magic-byte = 1, will still work with this. A consumer > >>> working > >>> with magic-byte =2 will also be able to understand this, since it > >>> understands the tombstone. > >>> Now there is still the question of supporting a non-tombstone and null > >>> value from producer client with magic-byte = 2.* (I am not sure if we > >>> should support this. Ismael/Becket can comment here)* > >>> > >>> 3) When almost all the clients have upgraded, the > message.format.version > >>> on > >>> the broker can be changed to 2, where in the down conversion in the > above > >>> step will not happen. If at this point we get a consumer request from a > >>> older consumer, we might have to down convert where in we loose zero > copy, > >>> but these cases should be rare. > >>> > >>> Becket can you review this plan and add more details if I have > >>> missed/wronged something, before we put it on KIP. > >>> > >>> Thanks, > >>> > >>> Mayuresh > >>> > >>> On Wed, Nov 16, 2016 at 11:07 PM, Michael Pearce < > michael.pea...@ig.com> > >>> wrote: > >>> > >>>> Thanks guys, for discussing this offline and getting some consensus. > >>>> > >>>> So its clear for myself and others what is proposed now (i think i > >>>> understand, but want to make sure) > >>>> > >>>> Could i ask either directly update the kip to detail the migration > >>>> strategy, or (re-)state your offline discussed and agreed migration > >>>> strategy based on a magic byte is in this thread. > >>>> > >>>> > >>>> The main original driver for the KIP was to support compaction where > >>> value > >>>> isn't null, based off the discussions on KIP-82 thread. > >>>> > >>>> We should be able to support non-tombstone + null value by the > >>> completion > >>>> of the KIP, as we noted when discussing this kip, having logic based > on > >>> a > >>>> null value isn't very clean and also separates the concerns. > >>>> > >>>> As discussed already though we can split this into KIP-87a and KIP-87b > >>>> > >>>> Where we look to deliver KIP-87a on a compacted topic (to address the > >>>> immediate issues) > >>>> * tombstone + null value > >>>> * tombstone + non-null value > >>>> * non-tombstone + non-null value > >>>> > >>>> Then we can discuss once KIP-87a is completed options later and how we > >>>> support the second part KIP-87b to deliver: > >>>> * non-tombstone + null value > >>>> > >>>> Cheers > >>>> Mike > >>>> > >>>> > >>>> > >>>> ________________________________________ > >>>> From: Becket Qin <becket....@gmail.com> > >>>> Sent: Thursday, November 17, 2016 1:43 AM > >>>> To: dev@kafka.apache.org > >>>> Subject: Re: [DISCUSS] KIP-87 - Add Compaction Tombstone Flag > >>>> > >>>> Renu, Mayuresh and I had an offline discussion, and following is a > brief > >>>> summary. > >>>> > >>>> 1. We agreed that not bumping up magic value may result in losing zero > >>> copy > >>>> during migration. > >>>> 2. Given that bumping up magic value is almost free and has benefit of > >>>> avoiding potential performance issue. It is probably worth doing. > >>>> > >>>> One issue we still need to think about is whether we want to support a > >>>> non-tombstone message with null value. > >>>> Currently it is not supported by Kafka. If we allow a non-tombstone > null > >>>> value message to exist after KIP-87. The problem is that such message > >>> will > >>>> not be supported by the consumers prior to KIP-87. Because a null > value > >>>> will always be interpreted to a tombstone. > >>>> > >>>> One option is that we keep the current way, i.e. do not support such > >>>> message. It would be good to know if there is a concrete use case for > >>> such > >>>> message. If there is not, we can probably just not support it. > >>>> > >>>> Thanks, > >>>> > >>>> JIangjie (Becket) Qin > >>>> > >>>> > >>>> > >>>> On Wed, Nov 16, 2016 at 1:28 PM, Mayuresh Gharat < > >>>> gharatmayures...@gmail.com > >>>>> wrote: > >>>> > >>>>> Hi Ismael, > >>>>> > >>>>> This is something I can think of for migration plan: > >>>>> So the migration plan can look something like this, with up > >>> conversion : > >>>>> > >>>>> 1) Currently lets say we have Broker at version x. > >>>>> 2) Currently we have clients at version x. > >>>>> 3) a) We move the version to Broker(x+1) : supports both tombstone > and > >>>> null > >>>>> for log compaction. > >>>>> b) We upgrade the client to version client(x+1) : if in the > >>> producer > >>>>> client(x+1) the value is set to null, we will automatically set the > >>>>> Tombstone bit internally. If the producer client(x+1) sets the > >>> tombstone > >>>>> itself, well and good. For producer client(x), the broker will up > >>> convert > >>>>> to have the tombstone bit. Broker(x+1) is supporting both. Consumer > >>>>> client(x+1) will be aware of this and should be able to handle this. > >>> For > >>>>> consumer client(x) we will down convert the message on the broker > >>> side. > >>>>> c) At this point we will have to specify a warning or clearly > >>> specify > >>>>> in docs that this behavior is about to be changed for log compaction. > >>>>> 4) a) In next release of the Broker(x+2), we say that only Tombstone > >>> is > >>>>> used for log compaction on the Broker side. Clients(x+1) still is > >>>>> supported. > >>>>> b) We upgrade the client to version client(x+2) : if value is set > >>> to > >>>>> null, tombstone will not be set automatically. The client will have > to > >>>> call > >>>>> setTombstone() to actually set the tombstone. > >>>>> > >>>>> We should compare this migration plan with the migration plan for > >>> magic > >>>>> byte bump and do whatever looks good. > >>>>> I am just worried that if we go down magic byte route, unless I am > >>>> missing > >>>>> something, it sounds like kafka will be stuck with supporting both > >>> null > >>>>> value and tombstone bit for log compaction for life long, which does > >>> not > >>>>> look like a good end state. > >>>>> > >>>>> Thanks, > >>>>> > >>>>> Mayuresh > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> On Wed, Nov 16, 2016 at 9:32 AM, Mayuresh Gharat < > >>>>> gharatmayures...@gmail.com > >>>>>> wrote: > >>>>> > >>>>>> Hi Ismael, > >>>>>> > >>>>>> That's a very good point which I might have not considered earlier. > >>>>>> > >>>>>> Here is a plan that I can think of: > >>>>>> > >>>>>> Stage 1) The broker from now on, up converts the message to have the > >>>>>> tombstone marker. The log compaction thread does log compaction > >>> based > >>>> on > >>>>>> both null and tombstone marker. This is our transition period. > >>>>>> Stage 2) The next release we only say that log compaction is based > >>> on > >>>>>> tombstone marker. (Open source kafka makes this as a policy). By > >>> this > >>>>> time, > >>>>>> the organization which is moving to this release will be sure that > >>> they > >>>>>> have gone through the entire transition period. > >>>>>> > >>>>>> My only goal of doing this is that Kafka clearly specifies the end > >>>> state > >>>>>> about what log compaction means (is it null value or a tombstone > >>>> marker, > >>>>>> but not both). > >>>>>> > >>>>>> What do you think? > >>>>>> > >>>>>> Thanks, > >>>>>> > >>>>>> Mayuresh > >>>>>> . > >>>>>> > >>>>>> On Wed, Nov 16, 2016 at 9:17 AM, Ismael Juma <ism...@juma.me.uk> > >>>> wrote: > >>>>>> > >>>>>>> One comment below. > >>>>>>> > >>>>>>> On Wed, Nov 16, 2016 at 5:08 PM, Mayuresh Gharat < > >>>>>>> gharatmayures...@gmail.com > >>>>>>>> wrote: > >>>>>>> > >>>>>>>> - If we don't bump up the magic byte, on the broker side, the > >>>>> broker > >>>>>>>> will always have to look at both tombstone bit and the value > >>> when > >>>>> do > >>>>>>> the > >>>>>>>> compaction. Assuming we do not bump up the magic byte, > >>>>>>>> imagine the broker sees a message which does not have a > >>> tombstone > >>>>> bit > >>>>>>>> set. The broker does not know when the message was produced > >>> (i.e. > >>>>>>>> whether > >>>>>>>> the message has been up converted or not), it has to take a > >>>> further > >>>>>>>> look at > >>>>>>>> the value to see if it is null or not in order to determine > >>> if it > >>>>> is > >>>>>>> a > >>>>>>>> tombstone. The same logic has to be put on the consumer as > >>> well > >>>>>>> because > >>>>>>>> the > >>>>>>>> consumer does not know if the message has been up converted or > >>>> not. > >>>>>>>> - If we upconvert while appending, this is not the case, > >>>> right? > >>>>>>> > >>>>>>> > >>>>>>> If I understand you correctly, this is not sufficient because the > >>> log > >>>>> may > >>>>>>> have messages appended before it was upgraded to include KIP-87. > >>>>>>> > >>>>>>> Ismael > >>>>>> > >>>>>> > >>>>>> > >>>>>> -- > >>>>>> -Regards, > >>>>>> Mayuresh R. Gharat > >>>>>> (862) 250-7125 > >>>>> > >>>>> > >>>>> > >>>>> -- > >>>>> -Regards, > >>>>> Mayuresh R. Gharat > >>>>> (862) 250-7125 > >>>> The information contained in this email is strictly confidential and > for > >>>> the use of the addressee only, unless otherwise indicated. If you are > >>> not > >>>> the intended recipient, please do not read, copy, use or disclose to > >>> others > >>>> this message or any attachment. Please also notify the sender by > >>> replying > >>>> to this email or by telephone (+44(020 7896 0011) and then delete the > >>> email > >>>> and any copies of it. Opinions, conclusion (etc) that do not relate to > >>> the > >>>> official business of this company shall be understood as neither given > >>> nor > >>>> endorsed by it. IG is a trading name of IG Markets Limited (a company > >>>> registered in England and Wales, company number 04008957) and IG Index > >>>> Limited (a company registered in England and Wales, company number > >>>> 01190902). Registered address at Cannon Bridge House, 25 Dowgate Hill, > >>>> London EC4R 2YA. Both IG Markets Limited (register number 195355) and > IG > >>>> Index Limited (register number 114059) are authorised and regulated by > >>> the > >>>> Financial Conduct Authority. > >>> > >>> > >>> > >>> -- > >>> -Regards, > >>> Mayuresh R. Gharat > >>> (862) 250-7125 > >>> The information contained in this email is strictly confidential and > for > >>> the use of the addressee only, unless otherwise indicated. If you are > not > >>> the intended recipient, please do not read, copy, use or disclose to > others > >>> this message or any attachment. Please also notify the sender by > replying > >>> to this email or by telephone (+44(020 7896 0011) and then delete the > email > >>> and any copies of it. Opinions, conclusion (etc) that do not relate to > the > >>> official business of this company shall be understood as neither given > nor > >>> endorsed by it. IG is a trading name of IG Markets Limited (a company > >>> registered in England and Wales, company number 04008957) and IG Index > >>> Limited (a company registered in England and Wales, company number > >>> 01190902). Registered address at Cannon Bridge House, 25 Dowgate Hill, > >>> London EC4R 2YA. Both IG Markets Limited (register number 195355) and > IG > >>> Index Limited (register number 114059) are authorised and regulated by > the > >>> Financial Conduct Authority. > >> > >> > >> > >> -- > >> -Regards, > >> Mayuresh R. Gharat > >> (862) 250-7125 > > > > > > > > -- > > -Regards, > > Mayuresh R. Gharat > > (862) 250-7125 > > > > > > The information contained in this email is strictly confidential and > for the use of the addressee only, unless otherwise indicated. If you are > not the intended recipient, please do not read, copy, use or disclose to > others this message or any attachment. Please also notify the sender by > replying to this email or by telephone (+44(020 7896 0011) and then delete > the email and any copies of it. Opinions, conclusion (etc) that do not > relate to the official business of this company shall be understood as > neither given nor endorsed by it. IG is a trading name of IG Markets > Limited (a company registered in England and Wales, company number > 04008957) and IG Index Limited (a company registered in England and Wales, > company number 01190902). Registered address at Cannon Bridge House, 25 > Dowgate Hill, London EC4R 2YA. Both IG Markets Limited (register number > 195355) and IG Index Limited (register number 114059) are authorised and > regulated by the Financial Conduct Authority. > > > > >