Hi Michael, 5) I believe this was in the discussion thread, @Mayuresh is this something we’ve overlooked? I thought we would down convert and remove the value so the old consumer had existing behavior, or is there something we haven’t thought about? ---> Yea, we would have to down convert and remove the value, so that old consumer retains existing behavior.
Thanks, Mayuresh On Fri, Dec 2, 2016 at 5:55 AM, Michael Pearce <michael.pea...@ig.com> wrote: > *Hi Jun, > > Soo sorry for the typo/mistake. > > On 02/12/2016, 11:19, "Michael Pearce" <michael.pea...@ig.com> wrote: > > Hi Jao > > Thanks for the response. Sorry for slow reply, both with personal > sickness and also battling some critical issues encountered since upgrading > to 0.10.1.0 > > 1) Thans for spotting, Document error where we branched this KIP from > KIP-82, will get that removed. > 2) Intent is to do this just at the record message level. > 3) Thanks for spotting, Will ensure this is corrected. > 4) As per discussion thread we will support tombstone + null value, > tombstone + non null value, no tombstone + null value. > 5) I believe this was in the discussion thread, @Mayuresh is this > something we’ve overlooked? I thought we would down convert and remove the > value so the old consumer had existing behavior, or is there something we > haven’t thought about? > > Cheers > Mike > > On 30/11/2016, 18:12, "Jun Rao" <j...@confluent.io> wrote: > > Hi, Michael, > > Thanks for the KIP. A few comments below. > > 1. The message format change contains "HeadersLength Headers". Is > that > intended? > > 2. For compressed messageset, is the tombstone bit only set at the > shallow > level? Do we always leave that bit in the wrapper message unset? An > alternative is to set the tombstone bit in the wrapper if at least > one > inner message has the tombstone bit set. This makes things a bit > more > complicated, but we could potentially exploit that for optimizing > down > conversion. For example, we only need to convert messages with > magic 2 to > magic 1 if the wrapper's tombstone bit is set (conversion is > always needed > from magic 2 to magic 0). Not sure if the optimization is worth the > complexity though. > > 3. The referencing of the new version of > ProducerRequest/FetchRequest is > inconsistent (v4 vs v3). Since our convention starts at version at > 0, I > think the new version would be 3. > > 4. "If the magic byte on message is 2, the broker should use the > tombstone > bit for log compaction." What about null value? My understanding > is that > null value will be treated the same as setting the tombstone bit. > > 5. For the migration path, it would be useful to describe the down > conversion path to consumers (i.e., brokers on message format > 0.10.2 and > consumers on older version). > > Thanks, > > Jun > > > On Tue, Nov 29, 2016 at 3:18 AM, Michael Pearce < > michael.pea...@ig.com> > wrote: > > > Hi All, > > > > We have been discussing in the below thread and final changes > have been > > made to the KIP wiki based on these discussions. > > > > We would now like to put to the vote the following KIP: > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-87+-+ > > Add+Compaction+Tombstone+Flag > > > > This kip is for having a distinct compaction attribute > “tombstone” flag > > instead of relying on null value, allowing non-null value delete > messages. > > > > Many thanks, > > Michael > > > > > > > > On 22/11/2016, 15:52, "Michael Pearce" <michael.pea...@ig.com> > wrote: > > > > Hi Mayuresh, > > > > LGTM. Ive just made one small adjustment updating the wire > protocol to > > show the magic byte bump. > > > > Do we think we’re good to put to a vote? Is there any other > bits > > needing discussion? > > > > Cheers > > Mike > > > > On 21/11/2016, 18:26, "Mayuresh Gharat" < > gharatmayures...@gmail.com> > > wrote: > > > > Hi Michael, > > > > I have updated the migration section of the KIP. Can you > please > > take a look? > > > > Thanks, > > > > Mayuresh > > > > On Fri, Nov 18, 2016 at 9:07 AM, Mayuresh Gharat < > > gharatmayures...@gmail.com > > > wrote: > > > > > Hi Michael, > > > > > > That whilst sending tombstone and non null value, the > consumer > > can expect > > > only to receive the non-null message only in step (3) > is this > > correct? > > > ---> I do agree with you here. > > > > > > Becket, Ismael : can you guys review the migration > plan listed > > above using > > > magic byte? > > > > > > Thanks, > > > > > > Mayuresh > > > > > > On Fri, Nov 18, 2016 at 8:58 AM, Michael Pearce < > > michael.pea...@ig.com> > > > wrote: > > > > > >> Many thanks for this Mayuresh. I don't have any > objections. > > >> > > >> I assume we should state: > > >> > > >> That whilst sending tombstone and non null value, the > consumer > > can expect > > >> only to receive the non-null message only in step (3) > is this > > correct? > > >> > > >> Cheers > > >> Mike > > >> > > >> > > >> > > >> Sent using OWA for iPhone > > >> ________________________________________ > > >> From: Mayuresh Gharat <gharatmayures...@gmail.com> > > >> Sent: Thursday, November 17, 2016 5:18:41 PM > > >> To: dev@kafka.apache.org > > >> Subject: Re: [DISCUSS] KIP-87 - Add Compaction > Tombstone Flag > > >> > > >> Hi Ismael, > > >> > > >> Thanks for the explanation. > > >> Specially I like this part where in you mentioned we > can get > > rid of the > > >> older null value support for log compaction later on, > here : > > >> We can't change semantics of the message format > without having > > a long > > >> transition period. And we can't rely > > >> on people reading documentation or acting on a > warning for > > something so > > >> fundamental. As such, my take is that we need to bump > the magic > > byte. The > > >> good news is > > >> that we don't have to support all versions forever. > We have > > said that we > > >> will support direct upgrades for 2 years. That means > that > > message format > > >> version n could, in theory, be removed 2 years after > the it's > > introduced. > > >> > > >> Just a heads up, I would like to mention that even > without > > bumping magic > > >> byte, we will *NOT* loose zero copy as in the > client(x+1) in my > > >> explanation > > >> above will convert internally a null value to have a > tombstone > > bit set and > > >> a tombstone bit set to have a null value automatically > > internally and by > > >> the time we move to version (x+2), the clients would > have > > upgraded. > > >> Obviously if we support a request from consumer(x), > we will > > loose zero > > >> copy > > >> but that is the same case with magic byte. > > >> > > >> But if magic byte bump makes life easier for > transition for the > > above > > >> reasons that you explained, I am OK with it since we > are going > > to meet the > > >> end goal down the road :) > > >> > > >> On a side note can we update the doc here on magic > byte to say > > that "*it > > >> should be bumped whenever the message format is > changed or the > > >> interpretation of message format (usage of the > reserved bits as > > well) is > > >> changed*". > > >> > > >> > > >> Hi Michael, > > >> > > >> Here is the update plan that we discussed offline > yesterday : > > >> > > >> Currently the magic-byte which corresponds to the > > "message.format.version" > > >> is set to 1. > > >> > > >> 1) On broker it will be set to 1 initially. > > >> > > >> 2) When a producer client sends a message with > magic-byte = 2, > > since the > > >> broker is on magic-byte = 1, we will down convert it, > which > > means if the > > >> tombstone bit is set, the value will be set to null. > A consumer > > >> understanding magic-byte = 1, will still work with > this. A > > consumer > > >> working > > >> with magic-byte =2 will also be able to understand > this, since > > it > > >> understands the tombstone. > > >> Now there is still the question of supporting a > non-tombstone > > and null > > >> value from producer client with magic-byte = 2.* (I > am not sure > > if we > > >> should support this. Ismael/Becket can comment here)* > > >> > > >> 3) When almost all the clients have upgraded, the > > message.format.version > > >> on > > >> the broker can be changed to 2, where in the down > conversion in > > the above > > >> step will not happen. If at this point we get a > consumer > > request from a > > >> older consumer, we might have to down convert where > in we loose > > zero copy, > > >> but these cases should be rare. > > >> > > >> Becket can you review this plan and add more details > if I have > > >> missed/wronged something, before we put it on KIP. > > >> > > >> Thanks, > > >> > > >> Mayuresh > > >> > > >> On Wed, Nov 16, 2016 at 11:07 PM, Michael Pearce < > > michael.pea...@ig.com> > > >> wrote: > > >> > > >> > Thanks guys, for discussing this offline and > getting some > > consensus. > > >> > > > >> > So its clear for myself and others what is proposed > now (i > > think i > > >> > understand, but want to make sure) > > >> > > > >> > Could i ask either directly update the kip to > detail the > > migration > > >> > strategy, or (re-)state your offline discussed and > agreed > > migration > > >> > strategy based on a magic byte is in this thread. > > >> > > > >> > > > >> > The main original driver for the KIP was to support > > compaction where > > >> value > > >> > isn't null, based off the discussions on KIP-82 > thread. > > >> > > > >> > We should be able to support non-tombstone + null > value by the > > >> completion > > >> > of the KIP, as we noted when discussing this kip, > having > > logic based on > > >> a > > >> > null value isn't very clean and also separates the > concerns. > > >> > > > >> > As discussed already though we can split this into > KIP-87a > > and KIP-87b > > >> > > > >> > Where we look to deliver KIP-87a on a compacted > topic (to > > address the > > >> > immediate issues) > > >> > * tombstone + null value > > >> > * tombstone + non-null value > > >> > * non-tombstone + non-null value > > >> > > > >> > Then we can discuss once KIP-87a is completed > options later > > and how we > > >> > support the second part KIP-87b to deliver: > > >> > * non-tombstone + null value > > >> > > > >> > Cheers > > >> > Mike > > >> > > > >> > > > >> > > > >> > ________________________________________ > > >> > From: Becket Qin <becket....@gmail.com> > > >> > Sent: Thursday, November 17, 2016 1:43 AM > > >> > To: dev@kafka.apache.org > > >> > Subject: Re: [DISCUSS] KIP-87 - Add Compaction > Tombstone Flag > > >> > > > >> > Renu, Mayuresh and I had an offline discussion, and > following > > is a brief > > >> > summary. > > >> > > > >> > 1. We agreed that not bumping up magic value may > result in > > losing zero > > >> copy > > >> > during migration. > > >> > 2. Given that bumping up magic value is almost free > and has > > benefit of > > >> > avoiding potential performance issue. It is > probably worth > > doing. > > >> > > > >> > One issue we still need to think about is whether > we want to > > support a > > >> > non-tombstone message with null value. > > >> > Currently it is not supported by Kafka. If we allow > a > > non-tombstone null > > >> > value message to exist after KIP-87. The problem is > that such > > message > > >> will > > >> > not be supported by the consumers prior to KIP-87. > Because a > > null value > > >> > will always be interpreted to a tombstone. > > >> > > > >> > One option is that we keep the current way, i.e. do > not > > support such > > >> > message. It would be good to know if there is a > concrete use > > case for > > >> such > > >> > message. If there is not, we can probably just not > support it. > > >> > > > >> > Thanks, > > >> > > > >> > JIangjie (Becket) Qin > > >> > > > >> > > > >> > > > >> > On Wed, Nov 16, 2016 at 1:28 PM, Mayuresh Gharat < > > >> > gharatmayures...@gmail.com > > >> > > wrote: > > >> > > > >> > > Hi Ismael, > > >> > > > > >> > > This is something I can think of for migration > plan: > > >> > > So the migration plan can look something like > this, with up > > >> conversion : > > >> > > > > >> > > 1) Currently lets say we have Broker at version x. > > >> > > 2) Currently we have clients at version x. > > >> > > 3) a) We move the version to Broker(x+1) : > supports both > > tombstone and > > >> > null > > >> > > for log compaction. > > >> > > b) We upgrade the client to version > client(x+1) : if in > > the > > >> producer > > >> > > client(x+1) the value is set to null, we will > automatically > > set the > > >> > > Tombstone bit internally. If the producer > client(x+1) sets > > the > > >> tombstone > > >> > > itself, well and good. For producer client(x), > the broker > > will up > > >> convert > > >> > > to have the tombstone bit. Broker(x+1) is > supporting both. > > Consumer > > >> > > client(x+1) will be aware of this and should be > able to > > handle this. > > >> For > > >> > > consumer client(x) we will down convert the > message on the > > broker > > >> side. > > >> > > c) At this point we will have to specify a > warning or > > clearly > > >> specify > > >> > > in docs that this behavior is about to be changed > for log > > compaction. > > >> > > 4) a) In next release of the Broker(x+2), we say > that only > > Tombstone > > >> is > > >> > > used for log compaction on the Broker side. > Clients(x+1) > > still is > > >> > > supported. > > >> > > b) We upgrade the client to version > client(x+2) : if > > value is set > > >> to > > >> > > null, tombstone will not be set automatically. > The client > > will have to > > >> > call > > >> > > setTombstone() to actually set the tombstone. > > >> > > > > >> > > We should compare this migration plan with the > migration > > plan for > > >> magic > > >> > > byte bump and do whatever looks good. > > >> > > I am just worried that if we go down magic byte > route, > > unless I am > > >> > missing > > >> > > something, it sounds like kafka will be stuck with > > supporting both > > >> null > > >> > > value and tombstone bit for log compaction for > life long, > > which does > > >> not > > >> > > look like a good end state. > > >> > > > > >> > > Thanks, > > >> > > > > >> > > Mayuresh > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > On Wed, Nov 16, 2016 at 9:32 AM, Mayuresh Gharat < > > >> > > gharatmayures...@gmail.com > > >> > > > wrote: > > >> > > > > >> > > > Hi Ismael, > > >> > > > > > >> > > > That's a very good point which I might have not > > considered earlier. > > >> > > > > > >> > > > Here is a plan that I can think of: > > >> > > > > > >> > > > Stage 1) The broker from now on, up converts > the message > > to have the > > >> > > > tombstone marker. The log compaction thread > does log > > compaction > > >> based > > >> > on > > >> > > > both null and tombstone marker. This is our > transition > > period. > > >> > > > Stage 2) The next release we only say that log > compaction > > is based > > >> on > > >> > > > tombstone marker. (Open source kafka makes this > as a > > policy). By > > >> this > > >> > > time, > > >> > > > the organization which is moving to this > release will be > > sure that > > >> they > > >> > > > have gone through the entire transition period. > > >> > > > > > >> > > > My only goal of doing this is that Kafka clearly > > specifies the end > > >> > state > > >> > > > about what log compaction means (is it null > value or a > > tombstone > > >> > marker, > > >> > > > but not both). > > >> > > > > > >> > > > What do you think? > > >> > > > > > >> > > > Thanks, > > >> > > > > > >> > > > Mayuresh > > >> > > > . > > >> > > > > > >> > > > On Wed, Nov 16, 2016 at 9:17 AM, Ismael Juma < > > ism...@juma.me.uk> > > >> > wrote: > > >> > > > > > >> > > >> One comment below. > > >> > > >> > > >> > > >> On Wed, Nov 16, 2016 at 5:08 PM, Mayuresh > Gharat < > > >> > > >> gharatmayures...@gmail.com > > >> > > >> > wrote: > > >> > > >> > > >> > > >> > - If we don't bump up the magic byte, on > the broker > > side, the > > >> > > broker > > >> > > >> > will always have to look at both > tombstone bit and > > the value > > >> when > > >> > > do > > >> > > >> the > > >> > > >> > compaction. Assuming we do not bump up > the magic > > byte, > > >> > > >> > imagine the broker sees a message which > does not > > have a > > >> tombstone > > >> > > bit > > >> > > >> > set. The broker does not know when the > message was > > produced > > >> (i.e. > > >> > > >> > whether > > >> > > >> > the message has been up converted or > not), it has > > to take a > > >> > further > > >> > > >> > look at > > >> > > >> > the value to see if it is null or not in > order to > > determine > > >> if it > > >> > > is > > >> > > >> a > > >> > > >> > tombstone. The same logic has to be put > on the > > consumer as > > >> well > > >> > > >> because > > >> > > >> > the > > >> > > >> > consumer does not know if the message has > been up > > converted or > > >> > not. > > >> > > >> > - If we upconvert while appending, > this is not > > the case, > > >> > right? > > >> > > >> > > >> > > >> > > >> > > >> If I understand you correctly, this is not > sufficient > > because the > > >> log > > >> > > may > > >> > > >> have messages appended before it was upgraded > to include > > KIP-87. > > >> > > >> > > >> > > >> Ismael > > >> > > >> > > >> > > > > > >> > > > > > >> > > > > > >> > > > -- > > >> > > > -Regards, > > >> > > > Mayuresh R. Gharat > > >> > > > (862) 250-7125 > > >> > > > > > >> > > > > >> > > > > >> > > > > >> > > -- > > >> > > -Regards, > > >> > > Mayuresh R. Gharat > > >> > > (862) 250-7125 > > >> > > > > >> > The information contained in this email is strictly > > confidential and for > > >> > the use of the addressee only, unless otherwise > indicated. If > > you are > > >> not > > >> > the intended recipient, please do not read, copy, > use or > > disclose to > > >> others > > >> > this message or any attachment. Please also notify > the sender > > by > > >> replying > > >> > to this email or by telephone (+44(020 7896 0011) > and then > > delete the > > >> email > > >> > and any copies of it. Opinions, conclusion (etc) > that do not > > relate to > > >> the > > >> > official business of this company shall be > understood as > > neither given > > >> nor > > >> > endorsed by it. IG is a trading name of IG Markets > Limited (a > > company > > >> > registered in England and Wales, company number > 04008957) and > > IG Index > > >> > Limited (a company registered in England and Wales, > company > > number > > >> > 01190902). Registered address at Cannon Bridge > House, 25 > > Dowgate Hill, > > >> > London EC4R 2YA. Both IG Markets Limited (register > number > > 195355) and IG > > >> > Index Limited (register number 114059) are > authorised and > > regulated by > > >> the > > >> > Financial Conduct Authority. > > >> > > > >> > > >> > > >> > > >> -- > > >> -Regards, > > >> Mayuresh R. Gharat > > >> (862) 250-7125 > > >> The information contained in this email is strictly > > confidential and for > > >> the use of the addressee only, unless otherwise > indicated. If > > you are not > > >> the intended recipient, please do not read, copy, use > or > > disclose to others > > >> this message or any attachment. Please also notify > the sender > > by replying > > >> to this email or by telephone (+44(020 7896 0011) and > then > > delete the email > > >> and any copies of it. Opinions, conclusion (etc) that > do not > > relate to the > > >> official business of this company shall be understood > as > > neither given nor > > >> endorsed by it. IG is a trading name of IG Markets > Limited (a > > company > > >> registered in England and Wales, company number > 04008957) and > > IG Index > > >> Limited (a company registered in England and Wales, > company > > number > > >> 01190902). Registered address at Cannon Bridge House, > 25 > > Dowgate Hill, > > >> London EC4R 2YA. Both IG Markets Limited (register > number > > 195355) and IG > > >> Index Limited (register number 114059) are authorised > and > > regulated by the > > >> Financial Conduct Authority. > > >> > > > > > > > > > > > > -- > > > -Regards, > > > Mayuresh R. Gharat > > > (862) 250-7125 > > > > > > > > > > > -- > > -Regards, > > Mayuresh R. Gharat > > (862) 250-7125 > > > > > > The information contained in this email is strictly > confidential and > > for the use of the addressee only, unless otherwise indicated. > If you are > > not the intended recipient, please do not read, copy, use or > disclose to > > others this message or any attachment. Please also notify the > sender by > > replying to this email or by telephone (+44(020 7896 0011) and > then delete > > the email and any copies of it. Opinions, conclusion (etc) that > do not > > relate to the official business of this company shall be > understood as > > neither given nor endorsed by it. IG is a trading name of IG > Markets > > Limited (a company registered in England and Wales, company > number > > 04008957) and IG Index Limited (a company registered in England > and Wales, > > company number 01190902). Registered address at Cannon Bridge > House, 25 > > Dowgate Hill, London EC4R 2YA. Both IG Markets Limited (register > number > > 195355) and IG Index Limited (register number 114059) are > authorised and > > regulated by the Financial Conduct Authority. > > > > > > > > > The information contained in this email is strictly confidential and > for the use of the addressee only, unless otherwise indicated. If you are > not the intended recipient, please do not read, copy, use or disclose to > others this message or any attachment. Please also notify the sender by > replying to this email or by telephone (+44(020 7896 0011) and then delete > the email and any copies of it. Opinions, conclusion (etc) that do not > relate to the official business of this company shall be understood as > neither given nor endorsed by it. IG is a trading name of IG Markets > Limited (a company registered in England and Wales, company number > 04008957) and IG Index Limited (a company registered in England and Wales, > company number 01190902). Registered address at Cannon Bridge House, 25 > Dowgate Hill, London EC4R 2YA. Both IG Markets Limited (register number > 195355) and IG Index Limited (register number 114059) are authorised and > regulated by the Financial Conduct Authority. > > > -- -Regards, Mayuresh R. Gharat (862) 250-7125