+1 (non-binding) Thanks,
Mayuresh > On Nov 29, 2016, at 3:18 AM, Michael Pearce <michael.pea...@ig.com> wrote: > > Hi All, > > We have been discussing in the below thread and final changes have been made > to the KIP wiki based on these discussions. > > We would now like to put to the vote the following KIP: > https://cwiki.apache.org/confluence/display/KAFKA/KIP-87+-+Add+Compaction+Tombstone+Flag > > This kip is for having a distinct compaction attribute “tombstone” flag > instead of relying on null value, allowing non-null value delete messages. > > Many thanks, > Michael > > > > On 22/11/2016, 15:52, "Michael Pearce" <michael.pea...@ig.com> wrote: > > Hi Mayuresh, > > LGTM. Ive just made one small adjustment updating the wire protocol to > show the magic byte bump. > > Do we think we’re good to put to a vote? Is there any other bits needing > discussion? > > Cheers > Mike > > On 21/11/2016, 18:26, "Mayuresh Gharat" <gharatmayures...@gmail.com> wrote: > > Hi Michael, > > I have updated the migration section of the KIP. Can you please take a > look? > > Thanks, > > Mayuresh > > On Fri, Nov 18, 2016 at 9:07 AM, Mayuresh Gharat > <gharatmayures...@gmail.com >> wrote: > >> Hi Michael, >> >> That whilst sending tombstone and non null value, the consumer can expect >> only to receive the non-null message only in step (3) is this correct? >> ---> I do agree with you here. >> >> Becket, Ismael : can you guys review the migration plan listed above using >> magic byte? >> >> Thanks, >> >> Mayuresh >> >> On Fri, Nov 18, 2016 at 8:58 AM, Michael Pearce <michael.pea...@ig.com> >> wrote: >> >>> Many thanks for this Mayuresh. I don't have any objections. >>> >>> I assume we should state: >>> >>> That whilst sending tombstone and non null value, the consumer can expect >>> only to receive the non-null message only in step (3) is this correct? >>> >>> Cheers >>> Mike >>> >>> >>> >>> Sent using OWA for iPhone >>> ________________________________________ >>> From: Mayuresh Gharat <gharatmayures...@gmail.com> >>> Sent: Thursday, November 17, 2016 5:18:41 PM >>> To: dev@kafka.apache.org >>> Subject: Re: [DISCUSS] KIP-87 - Add Compaction Tombstone Flag >>> >>> Hi Ismael, >>> >>> Thanks for the explanation. >>> Specially I like this part where in you mentioned we can get rid of the >>> older null value support for log compaction later on, here : >>> We can't change semantics of the message format without having a long >>> transition period. And we can't rely >>> on people reading documentation or acting on a warning for something so >>> fundamental. As such, my take is that we need to bump the magic byte. The >>> good news is >>> that we don't have to support all versions forever. We have said that we >>> will support direct upgrades for 2 years. That means that message format >>> version n could, in theory, be removed 2 years after the it's introduced. >>> >>> Just a heads up, I would like to mention that even without bumping magic >>> byte, we will *NOT* loose zero copy as in the client(x+1) in my >>> explanation >>> above will convert internally a null value to have a tombstone bit set and >>> a tombstone bit set to have a null value automatically internally and by >>> the time we move to version (x+2), the clients would have upgraded. >>> Obviously if we support a request from consumer(x), we will loose zero >>> copy >>> but that is the same case with magic byte. >>> >>> But if magic byte bump makes life easier for transition for the above >>> reasons that you explained, I am OK with it since we are going to meet the >>> end goal down the road :) >>> >>> On a side note can we update the doc here on magic byte to say that "*it >>> should be bumped whenever the message format is changed or the >>> interpretation of message format (usage of the reserved bits as well) is >>> changed*". >>> >>> >>> Hi Michael, >>> >>> Here is the update plan that we discussed offline yesterday : >>> >>> Currently the magic-byte which corresponds to the "message.format.version" >>> is set to 1. >>> >>> 1) On broker it will be set to 1 initially. >>> >>> 2) When a producer client sends a message with magic-byte = 2, since the >>> broker is on magic-byte = 1, we will down convert it, which means if the >>> tombstone bit is set, the value will be set to null. A consumer >>> understanding magic-byte = 1, will still work with this. A consumer >>> working >>> with magic-byte =2 will also be able to understand this, since it >>> understands the tombstone. >>> Now there is still the question of supporting a non-tombstone and null >>> value from producer client with magic-byte = 2.* (I am not sure if we >>> should support this. Ismael/Becket can comment here)* >>> >>> 3) When almost all the clients have upgraded, the message.format.version >>> on >>> the broker can be changed to 2, where in the down conversion in the above >>> step will not happen. If at this point we get a consumer request from a >>> older consumer, we might have to down convert where in we loose zero copy, >>> but these cases should be rare. >>> >>> Becket can you review this plan and add more details if I have >>> missed/wronged something, before we put it on KIP. >>> >>> Thanks, >>> >>> Mayuresh >>> >>> On Wed, Nov 16, 2016 at 11:07 PM, Michael Pearce <michael.pea...@ig.com> >>> wrote: >>> >>>> Thanks guys, for discussing this offline and getting some consensus. >>>> >>>> So its clear for myself and others what is proposed now (i think i >>>> understand, but want to make sure) >>>> >>>> Could i ask either directly update the kip to detail the migration >>>> strategy, or (re-)state your offline discussed and agreed migration >>>> strategy based on a magic byte is in this thread. >>>> >>>> >>>> The main original driver for the KIP was to support compaction where >>> value >>>> isn't null, based off the discussions on KIP-82 thread. >>>> >>>> We should be able to support non-tombstone + null value by the >>> completion >>>> of the KIP, as we noted when discussing this kip, having logic based on >>> a >>>> null value isn't very clean and also separates the concerns. >>>> >>>> As discussed already though we can split this into KIP-87a and KIP-87b >>>> >>>> Where we look to deliver KIP-87a on a compacted topic (to address the >>>> immediate issues) >>>> * tombstone + null value >>>> * tombstone + non-null value >>>> * non-tombstone + non-null value >>>> >>>> Then we can discuss once KIP-87a is completed options later and how we >>>> support the second part KIP-87b to deliver: >>>> * non-tombstone + null value >>>> >>>> Cheers >>>> Mike >>>> >>>> >>>> >>>> ________________________________________ >>>> From: Becket Qin <becket....@gmail.com> >>>> Sent: Thursday, November 17, 2016 1:43 AM >>>> To: dev@kafka.apache.org >>>> Subject: Re: [DISCUSS] KIP-87 - Add Compaction Tombstone Flag >>>> >>>> Renu, Mayuresh and I had an offline discussion, and following is a brief >>>> summary. >>>> >>>> 1. We agreed that not bumping up magic value may result in losing zero >>> copy >>>> during migration. >>>> 2. Given that bumping up magic value is almost free and has benefit of >>>> avoiding potential performance issue. It is probably worth doing. >>>> >>>> One issue we still need to think about is whether we want to support a >>>> non-tombstone message with null value. >>>> Currently it is not supported by Kafka. If we allow a non-tombstone null >>>> value message to exist after KIP-87. The problem is that such message >>> will >>>> not be supported by the consumers prior to KIP-87. Because a null value >>>> will always be interpreted to a tombstone. >>>> >>>> One option is that we keep the current way, i.e. do not support such >>>> message. It would be good to know if there is a concrete use case for >>> such >>>> message. If there is not, we can probably just not support it. >>>> >>>> Thanks, >>>> >>>> JIangjie (Becket) Qin >>>> >>>> >>>> >>>> On Wed, Nov 16, 2016 at 1:28 PM, Mayuresh Gharat < >>>> gharatmayures...@gmail.com >>>>> wrote: >>>> >>>>> Hi Ismael, >>>>> >>>>> This is something I can think of for migration plan: >>>>> So the migration plan can look something like this, with up >>> conversion : >>>>> >>>>> 1) Currently lets say we have Broker at version x. >>>>> 2) Currently we have clients at version x. >>>>> 3) a) We move the version to Broker(x+1) : supports both tombstone and >>>> null >>>>> for log compaction. >>>>> b) We upgrade the client to version client(x+1) : if in the >>> producer >>>>> client(x+1) the value is set to null, we will automatically set the >>>>> Tombstone bit internally. If the producer client(x+1) sets the >>> tombstone >>>>> itself, well and good. For producer client(x), the broker will up >>> convert >>>>> to have the tombstone bit. Broker(x+1) is supporting both. Consumer >>>>> client(x+1) will be aware of this and should be able to handle this. >>> For >>>>> consumer client(x) we will down convert the message on the broker >>> side. >>>>> c) At this point we will have to specify a warning or clearly >>> specify >>>>> in docs that this behavior is about to be changed for log compaction. >>>>> 4) a) In next release of the Broker(x+2), we say that only Tombstone >>> is >>>>> used for log compaction on the Broker side. Clients(x+1) still is >>>>> supported. >>>>> b) We upgrade the client to version client(x+2) : if value is set >>> to >>>>> null, tombstone will not be set automatically. The client will have to >>>> call >>>>> setTombstone() to actually set the tombstone. >>>>> >>>>> We should compare this migration plan with the migration plan for >>> magic >>>>> byte bump and do whatever looks good. >>>>> I am just worried that if we go down magic byte route, unless I am >>>> missing >>>>> something, it sounds like kafka will be stuck with supporting both >>> null >>>>> value and tombstone bit for log compaction for life long, which does >>> not >>>>> look like a good end state. >>>>> >>>>> Thanks, >>>>> >>>>> Mayuresh >>>>> >>>>> >>>>> >>>>> >>>>> On Wed, Nov 16, 2016 at 9:32 AM, Mayuresh Gharat < >>>>> gharatmayures...@gmail.com >>>>>> wrote: >>>>> >>>>>> Hi Ismael, >>>>>> >>>>>> That's a very good point which I might have not considered earlier. >>>>>> >>>>>> Here is a plan that I can think of: >>>>>> >>>>>> Stage 1) The broker from now on, up converts the message to have the >>>>>> tombstone marker. The log compaction thread does log compaction >>> based >>>> on >>>>>> both null and tombstone marker. This is our transition period. >>>>>> Stage 2) The next release we only say that log compaction is based >>> on >>>>>> tombstone marker. (Open source kafka makes this as a policy). By >>> this >>>>> time, >>>>>> the organization which is moving to this release will be sure that >>> they >>>>>> have gone through the entire transition period. >>>>>> >>>>>> My only goal of doing this is that Kafka clearly specifies the end >>>> state >>>>>> about what log compaction means (is it null value or a tombstone >>>> marker, >>>>>> but not both). >>>>>> >>>>>> What do you think? >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Mayuresh >>>>>> . >>>>>> >>>>>> On Wed, Nov 16, 2016 at 9:17 AM, Ismael Juma <ism...@juma.me.uk> >>>> wrote: >>>>>> >>>>>>> One comment below. >>>>>>> >>>>>>> On Wed, Nov 16, 2016 at 5:08 PM, Mayuresh Gharat < >>>>>>> gharatmayures...@gmail.com >>>>>>>> wrote: >>>>>>> >>>>>>>> - If we don't bump up the magic byte, on the broker side, the >>>>> broker >>>>>>>> will always have to look at both tombstone bit and the value >>> when >>>>> do >>>>>>> the >>>>>>>> compaction. Assuming we do not bump up the magic byte, >>>>>>>> imagine the broker sees a message which does not have a >>> tombstone >>>>> bit >>>>>>>> set. The broker does not know when the message was produced >>> (i.e. >>>>>>>> whether >>>>>>>> the message has been up converted or not), it has to take a >>>> further >>>>>>>> look at >>>>>>>> the value to see if it is null or not in order to determine >>> if it >>>>> is >>>>>>> a >>>>>>>> tombstone. The same logic has to be put on the consumer as >>> well >>>>>>> because >>>>>>>> the >>>>>>>> consumer does not know if the message has been up converted or >>>> not. >>>>>>>> - If we upconvert while appending, this is not the case, >>>> right? >>>>>>> >>>>>>> >>>>>>> If I understand you correctly, this is not sufficient because the >>> log >>>>> may >>>>>>> have messages appended before it was upgraded to include KIP-87. >>>>>>> >>>>>>> Ismael >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> -Regards, >>>>>> Mayuresh R. Gharat >>>>>> (862) 250-7125 >>>>> >>>>> >>>>> >>>>> -- >>>>> -Regards, >>>>> Mayuresh R. Gharat >>>>> (862) 250-7125 >>>> The information contained in this email is strictly confidential and for >>>> the use of the addressee only, unless otherwise indicated. If you are >>> not >>>> the intended recipient, please do not read, copy, use or disclose to >>> others >>>> this message or any attachment. Please also notify the sender by >>> replying >>>> to this email or by telephone (+44(020 7896 0011) and then delete the >>> email >>>> and any copies of it. Opinions, conclusion (etc) that do not relate to >>> the >>>> official business of this company shall be understood as neither given >>> nor >>>> endorsed by it. IG is a trading name of IG Markets Limited (a company >>>> registered in England and Wales, company number 04008957) and IG Index >>>> Limited (a company registered in England and Wales, company number >>>> 01190902). Registered address at Cannon Bridge House, 25 Dowgate Hill, >>>> London EC4R 2YA. Both IG Markets Limited (register number 195355) and IG >>>> Index Limited (register number 114059) are authorised and regulated by >>> the >>>> Financial Conduct Authority. >>> >>> >>> >>> -- >>> -Regards, >>> Mayuresh R. Gharat >>> (862) 250-7125 >>> The information contained in this email is strictly confidential and for >>> the use of the addressee only, unless otherwise indicated. If you are not >>> the intended recipient, please do not read, copy, use or disclose to others >>> this message or any attachment. Please also notify the sender by replying >>> to this email or by telephone (+44(020 7896 0011) and then delete the email >>> and any copies of it. Opinions, conclusion (etc) that do not relate to the >>> official business of this company shall be understood as neither given nor >>> endorsed by it. IG is a trading name of IG Markets Limited (a company >>> registered in England and Wales, company number 04008957) and IG Index >>> Limited (a company registered in England and Wales, company number >>> 01190902). Registered address at Cannon Bridge House, 25 Dowgate Hill, >>> London EC4R 2YA. Both IG Markets Limited (register number 195355) and IG >>> Index Limited (register number 114059) are authorised and regulated by the >>> Financial Conduct Authority. >> >> >> >> -- >> -Regards, >> Mayuresh R. Gharat >> (862) 250-7125 > > > > -- > -Regards, > Mayuresh R. Gharat > (862) 250-7125 > > > The information contained in this email is strictly confidential and for > the use of the addressee only, unless otherwise indicated. If you are not the > intended recipient, please do not read, copy, use or disclose to others this > message or any attachment. Please also notify the sender by replying to this > email or by telephone (+44(020 7896 0011) and then delete the email and any > copies of it. Opinions, conclusion (etc) that do not relate to the official > business of this company shall be understood as neither given nor endorsed by > it. IG is a trading name of IG Markets Limited (a company registered in > England and Wales, company number 04008957) and IG Index Limited (a company > registered in England and Wales, company number 01190902). Registered address > at Cannon Bridge House, 25 Dowgate Hill, London EC4R 2YA. Both IG Markets > Limited (register number 195355) and IG Index Limited (register number > 114059) are authorised and regulated by the Financial Conduct Authority. > >