I updated the KIP: Using acks > 1 in version 0 will log a WARN message in the broker about client using deprecated behavior (suggested by Joe in the JIRA, and I think it makes sense).
Gwen On Fri, Jan 16, 2015 at 10:40 AM, Gwen Shapira <gshap...@cloudera.com> wrote: > How about we continue the discussion on this thread, so we won't lose > the context of this discussion, and put it up for VOTE when this has > been finalized? > > On Fri, Jan 16, 2015 at 10:22 AM, Neha Narkhede <n...@confluent.io> wrote: >> Gwen, >> >> KIP write-up looks good. According to the rest of the KIP process proposal, >> would you like to start a DISCUSS/VOTE thread for it? >> >> Thanks, >> Neha >> >> On Fri, Jan 16, 2015 at 9:37 AM, Ewen Cheslack-Postava <e...@confluent.io> >> wrote: >> >>> Gwen -- KIP write up looks good. Deprecation schedule probably needs to be >>> more specific, but I think that discussion probably needs to happen after a >>> solution is agreed upon. >>> >>> Jay -- I think "older clients will get a bad error message instead of a >>> good one" isn't what would be happening with this change. Previously they >>> wouldn't have received an error and they would have been able to produce >>> messages. After the change they'll just receive this new error message >>> which their clients can't possibly handle gracefully since it didn't exist >>> when the client was written. Whether the acks > 1 setting was actually >>> accomplishing what they thought doesn't matter. Someone could have >>> reasonably read the docs on 0.8.1.1, thought acks = 2 is an ok setting for >>> their applications, set it as a default across a bunch of apps, then follow >>> the recommended upgrade path of updating brokers to 0.8.2 and all their >>> apps will start failing on produce requests. >>> >>> >>> On Thu, Jan 15, 2015 at 8:27 PM, Jay Kreps <jay.kr...@gmail.com> wrote: >>> >>> > This is a good case to discuss. >>> > >>> > Let's figure the general case of how we want to handle errors and get >>> that >>> > documented in the protocol. The problem right now is that we give no >>> > guidance on this. I actually thought Gwen's suggestion made sense on the >>> > guidance we should have given which is that we will enumerate a set of >>> > errors and their meaning for each API but it is possible that other >>> errors >>> > will occur and they should be handled (maybe poorly) in the same way >>> > UNKNOWN_ERROR is handled which is our normal escape hatch for things like >>> > OOMException. >>> > >>> > I really do think we shouldn't be dogmatic here: In considering a change >>> > to errors we should consider the potential ill-effect vs the complexity >>> of >>> > yet another protocol version. >>> > >>> > In this case I actually am not sure we need to bump the protocol because >>> > the whole point of the change was to make a setting we think doesn't make >>> > sense break, right? Well this will break it. It seems like the only >>> > downside is that older clients will get a bad error message instead of a >>> > good one. But it isn't like we will have rendered a client unusable, it >>> is >>> > just that they will need to change their config. >>> > >>> > -Jay >>> > >>> > On Thu, Jan 15, 2015 at 6:14 PM, Gwen Shapira <gshap...@cloudera.com> >>> > wrote: >>> > >>> >> I created a KIP for this suggestion: >>> >> >>> >> >>> >> >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1+-+Remove+support+of+request.required.acks >>> >> >>> >> Basically documenting what was already discussed here. Comments will >>> >> be awesome! >>> >> >>> >> Gwen >>> >> >>> >> On Thu, Jan 15, 2015 at 5:19 PM, Gwen Shapira <gshap...@cloudera.com> >>> >> wrote: >>> >> > The errors are part of the KIP process now, so I think the clients are >>> >> safe :) >>> >> > >>> >> > >>> >> >>> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals >>> >> > >>> >> > On Thu, Jan 15, 2015 at 5:12 PM, Steve Morin <steve.mo...@gmail.com> >>> >> wrote: >>> >> >> Agree errors should be part of the protocol >>> >> >> >>> >> >>> On Jan 15, 2015, at 17:59, Gwen Shapira <gshap...@cloudera.com> >>> >> wrote: >>> >> >>> >>> >> >>> Hi, >>> >> >>> >>> >> >>> I got convinced by Joe and Dana that errors are indeed part of the >>> >> >>> protocol and can't be randomly added. >>> >> >>> >>> >> >>> So, it looks like we need to bump version of ProduceRequest in the >>> >> >>> following way: >>> >> >>> Version 0 -> accept acks >1. I think we should keep the existing >>> >> >>> behavior too (i.e. not replace it with -1) to avoid surprising >>> >> >>> clients, but I'm willing to hear other opinions. >>> >> >>> Version 1 -> do not accept acks >1 and return an error. >>> >> >>> Are we ok with the error I added in KAFKA-1697? We can use something >>> >> >>> less specific like InvalidRequestParameter. This error can be reused >>> >> >>> in the future and reduce the need to add errors, but will also be >>> less >>> >> >>> clear to the client and its users. Maybe even add the error message >>> >> >>> string to the protocol in addition to the error code? (since we are >>> >> >>> bumping versions....) >>> >> >>> >>> >> >>> I think maintaining the old version throughout 0.8.X makes sense. >>> IMO >>> >> >>> dropping it for 0.9 is feasible, but I'll let client owners help >>> make >>> >> >>> that call. >>> >> >>> >>> >> >>> Am I missing anything? Should I start a KIP? It seems like a >>> KIP-type >>> >> >>> discussion :) >>> >> >>> >>> >> >>> Gwen >>> >> >>> >>> >> >>> >>> >> >>> On Thu, Jan 15, 2015 at 2:31 PM, Ewen Cheslack-Postava >>> >> >>> <e...@confluent.io> wrote: >>> >> >>>> Gwen, >>> >> >>>> >>> >> >>>> I think the only option that wouldn't require a protocol version >>> >> change is >>> >> >>>> the one where acks > 1 is converted to acks = -1 since it's the >>> only >>> >> one >>> >> >>>> that doesn't potentially break older clients. The protocol guide >>> >> says that >>> >> >>>> the expected upgrade path is servers first, then clients, so old >>> >> clients, >>> >> >>>> including non-java clients, that may be using acks > 1 should be >>> >> able to >>> >> >>>> work with a new broker version. >>> >> >>>> >>> >> >>>> It's more work, but I think dealing with the protocol change is the >>> >> right >>> >> >>>> thing to do since it eventually gets us to the behavior I think is >>> >> better -- >>> >> >>>> the broker should reject requests with invalid values. I think Joe >>> >> and I >>> >> >>>> were basically in agreement. In my mind the major piece missing >>> from >>> >> his >>> >> >>>> description is how long we're going to maintain his "case 0" >>> >> behavior. It's >>> >> >>>> impractical to maintain old versions forever, but it sounds like >>> >> there >>> >> >>>> hasn't been a decision on how long to maintain them. Maybe that's >>> >> another >>> >> >>>> item to add to KIPs -- protocol versions and behavior need to be >>> >> listed as >>> >> >>>> deprecated and the earliest version in which they'll be removed >>> >> should be >>> >> >>>> specified so users can understand which versions are guaranteed to >>> be >>> >> >>>> compatible, even if they're using (well-written) non-java clients. >>> >> >>>> >>> >> >>>> -Ewen >>> >> >>>> >>> >> >>>> >>> >> >>>>> On Thu, Jan 15, 2015 at 12:52 PM, Dana Powers < >>> >> dana.pow...@gmail.com> wrote: >>> >> >>>>> >>> >> >>>>>> clients don't break on unknown errors >>> >> >>>>> >>> >> >>>>> maybe true for the official java clients, but I dont think the >>> >> assumption >>> >> >>>>> holds true for community-maintained clients and users of those >>> >> clients. >>> >> >>>>> kafka-python generally follows the fail-fast philosophy and raises >>> >> an >>> >> >>>>> exception on any unrecognized error code in any server response. >>> >> in this >>> >> >>>>> case, kafka-python allows users to set their own required-acks >>> >> policy when >>> >> >>>>> creating a producer instance. It is possible that users of >>> >> kafka-python >>> >> >>>>> have deployed producer code that uses ack>1 -- perhaps in >>> production >>> >> >>>>> environments -- and for those users the new error code will crash >>> >> their >>> >> >>>>> producer code. I would not be surprised if the same were true of >>> >> other >>> >> >>>>> community clients. >>> >> >>>>> >>> >> >>>>> *one reason for the fail-fast approach is that there isn't great >>> >> >>>>> documentation on what errors to expect for each request / response >>> >> -- so >>> >> >>>>> we >>> >> >>>>> use failures to alert that some error case is not handled >>> >> properly. and >>> >> >>>>> because of that, introducing new error cases without bumping the >>> api >>> >> >>>>> version is likely to cause those errors to get raised/thrown all >>> >> the way >>> >> >>>>> back up to the user. of course we (client maintainers) can fix >>> the >>> >> issues >>> >> >>>>> in the client libraries and suggest users upgrade, but it's not >>> the >>> >> ideal >>> >> >>>>> situation. >>> >> >>>>> >>> >> >>>>> >>> >> >>>>> long-winded way of saying: I agree w/ Joe. >>> >> >>>>> >>> >> >>>>> -Dana >>> >> >>>>> >>> >> >>>>> >>> >> >>>>> On Thu, Jan 15, 2015 at 12:07 PM, Gwen Shapira < >>> >> gshap...@cloudera.com> >>> >> >>>>> wrote: >>> >> >>>>> >>> >> >>>>>> Is the protocol bump caused by the behavior change or the new >>> error >>> >> >>>>>> code? >>> >> >>>>>> >>> >> >>>>>> 1) IMO, error_codes are data, and clients can expect to receive >>> >> errors >>> >> >>>>>> that they don't understand (i.e. unknown errors). AFAIK, clients >>> >> don't >>> >> >>>>>> break on unknown errors, they are simple more challenging to >>> >> debug. If >>> >> >>>>>> we document the new behavior, then its definitely debuggable and >>> >> >>>>>> fixable. >>> >> >>>>>> >>> >> >>>>>> 2) The behavior change is basically a deprecation - i.e. acks > 1 >>> >> were >>> >> >>>>>> never documented, and are not supported by Kafka clients starting >>> >> with >>> >> >>>>>> version 0.8.2. I'm not sure this requires a protocol bump either, >>> >> >>>>>> although its a better case than new error codes. >>> >> >>>>>> >>> >> >>>>>> Thanks, >>> >> >>>>>> Gwen >>> >> >>>>>> >>> >> >>>>>> On Thu, Jan 15, 2015 at 10:10 AM, Joe Stein < >>> joe.st...@stealth.ly> >>> >> >>>>>> wrote: >>> >> >>>>>>> Looping in the mailing list that the client developers live on >>> >> because >>> >> >>>>>> they >>> >> >>>>>>> are all not on dev (though they should be if they want to be >>> >> helping >>> >> >>>>>>> to >>> >> >>>>>>> build the best client libraries they can). >>> >> >>>>>>> >>> >> >>>>>>> I whole hardily believe that we need to not break existing >>> >> >>>>>>> functionality >>> >> >>>>>> of >>> >> >>>>>>> the client protocol, ever. >>> >> >>>>>>> >>> >> >>>>>>> There are many reasons for this and we have other threads on the >>> >> >>>>>>> mailing >>> >> >>>>>>> list where we are discussing that topic (no pun intended) that I >>> >> don't >>> >> >>>>>> want >>> >> >>>>>>> to re-hash here. >>> >> >>>>>>> >>> >> >>>>>>> If we change wire protocol functionality OR the binary format >>> >> (either) >>> >> >>>>>>> we >>> >> >>>>>>> must bump version AND treat version as a feature flag with >>> >> backward >>> >> >>>>>>> compatibility support until it is deprecated for some time for >>> >> folks >>> >> >>>>>>> to >>> >> >>>>>> deal >>> >> >>>>>>> with it. >>> >> >>>>>>> >>> >> >>>>>>> match version = { >>> >> >>>>>>> case 0: keepDoingWhatWeWereDoing() >>> >> >>>>>>> case 1: doNewStuff() >>> >> >>>>>>> case 2: doEvenMoreNewStuff() >>> >> >>>>>>> } >>> >> >>>>>>> >>> >> >>>>>>> has to be a practice we adopt imho ... I know feature flags can >>> be >>> >> >>>>>> construed >>> >> >>>>>>> as "messy code" but I am eager to hear another (better? >>> >> different?) >>> >> >>>>>> solution >>> >> >>>>>>> to this. >>> >> >>>>>>> >>> >> >>>>>>> If we don't do a feature flag like this specifically with this >>> >> change >>> >> >>>>>> then >>> >> >>>>>>> what happens is that someone upgrades their brokers with a >>> rolling >>> >> >>>>>> restart >>> >> >>>>>>> in 0.8.3 and every single one of their producer requests start >>> to >>> >> fail >>> >> >>>>>> and >>> >> >>>>>>> they have a major production outage. eeeek!!!! >>> >> >>>>>>> >>> >> >>>>>>> I do 100% agree that > 1 makes no sense and we *REALLY* need >>> >> people to >>> >> >>>>>> start >>> >> >>>>>>> using 0,1,-1 but we need to-do that in a way that is going to >>> >> work for >>> >> >>>>>>> everyone. >>> >> >>>>>>> >>> >> >>>>>>> Old producers and consumers must keep working with new brokers >>> >> and if >>> >> >>>>>>> we >>> >> >>>>>> are >>> >> >>>>>>> not going to support that then I am unclear what the use of >>> >> "version" >>> >> >>>>>>> is >>> >> >>>>>>> based on our original intentions of having it because of the >>> >> >>>>>>> 0.7=>-0.8. >>> >> >>>>>> We >>> >> >>>>>>> said no more breaking changes when we did that. >>> >> >>>>>>> >>> >> >>>>>>> - Joe Stein >>> >> >>>>>>> >>> >> >>>>>>> On Thu, Jan 15, 2015 at 12:38 PM, Ewen Cheslack-Postava < >>> >> >>>>>> e...@confluent.io> >>> >> >>>>>>> wrote: >>> >> >>>>>>>> >>> >> >>>>>>>> Right, so this looks like it could create an issue similar to >>> >> what's >>> >> >>>>>>>> currently being discussed in >>> >> >>>>>>>> https://issues.apache.org/jira/browse/KAFKA-1649 where users >>> >> now get >>> >> >>>>>>>> errors >>> >> >>>>>>>> under conditions when they previously wouldn't. Old clients >>> won't >>> >> >>>>>>>> even >>> >> >>>>>>>> know >>> >> >>>>>>>> about the error code, so besides failing they won't even be >>> able >>> >> to >>> >> >>>>>>>> log >>> >> >>>>>>>> any >>> >> >>>>>>>> meaningful error messages. >>> >> >>>>>>>> >>> >> >>>>>>>> I think there are two options for compatibility: >>> >> >>>>>>>> >>> >> >>>>>>>> 1. An alternative change is to remove the ack > 1 code, but >>> >> silently >>> >> >>>>>>>> "upgrade" requests with acks > 1 to acks = -1. This isn't the >>> >> same as >>> >> >>>>>>>> other >>> >> >>>>>>>> changes to behavior since the interaction between the client >>> and >>> >> >>>>>>>> server >>> >> >>>>>>>> remains the same, no error codes change, etc. The client might >>> >> just >>> >> >>>>>>>> see >>> >> >>>>>>>> some increased latency since the message might need to be >>> >> replicated >>> >> >>>>>>>> to >>> >> >>>>>>>> more brokers than they requested. >>> >> >>>>>>>> 2. Split this into two patches, one that bumps the protocol >>> >> version >>> >> >>>>>>>> on >>> >> >>>>>>>> that >>> >> >>>>>>>> message to include the new error code but maintains both old >>> (now >>> >> >>>>>>>> deprecated) and new behavior, then a second that would be >>> >> applied in >>> >> >>>>>>>> a >>> >> >>>>>>>> later release that removes the old protocol + code for handling >>> >> acks >>> >> >>>>>> 1. >>> >> >>>>>>>> >>> >> >>>>>>>> 2 is probably the right thing to do. If we specify the release >>> >> when >>> >> >>>>>> we'll >>> >> >>>>>>>> remove the deprecated protocol at the time of deprecation it >>> >> makes >>> >> >>>>>> things >>> >> >>>>>>>> a >>> >> >>>>>>>> lot easier for people writing non-java clients and could give >>> >> users >>> >> >>>>>> better >>> >> >>>>>>>> predictability (e.g. if clients are at most 1 major release >>> >> behind >>> >> >>>>>>>> brokers, >>> >> >>>>>>>> they'll remain compatible but possibly use deprecated >>> features). >>> >> >>>>>>>> >>> >> >>>>>>>> >>> >> >>>>>>>> On Wed, Jan 14, 2015 at 3:51 PM, Gwen Shapira < >>> >> gshap...@cloudera.com> >>> >> >>>>>>>> wrote: >>> >> >>>>>>>> >>> >> >>>>>>>>> Hi Kafka Devs, >>> >> >>>>>>>>> >>> >> >>>>>>>>> We are working on KAFKA-1697 - remove code related to ack>1 on >>> >> the >>> >> >>>>>>>>> broker. Per Neha's suggestion, I'd like to give everyone a >>> >> heads up >>> >> >>>>>>>>> on >>> >> >>>>>>>>> what these changes mean. >>> >> >>>>>>>>> >>> >> >>>>>>>>> Once this patch is included, any produce requests that include >>> >> >>>>>>>>> request.required.acks > 1 will result in an exception. This >>> >> will be >>> >> >>>>>>>>> InvalidRequiredAcks in new versions (0.8.3 and up, I assume) >>> and >>> >> >>>>>>>>> UnknownException in existing versions (sorry, but I can't add >>> >> error >>> >> >>>>>>>>> codes retroactively). >>> >> >>>>>>>>> >>> >> >>>>>>>>> This behavior is already enforced by 0.8.2 producers (sync and >>> >> >>>>>>>>> new), >>> >> >>>>>>>>> but we expect impact on users with older producers that relied >>> >> on >>> >> >>>>>>>>> acks >>> >> >>>>>>>>>> 1 and external clients (i.e python, go, etc). >>> >> >>>>>>>>> >>> >> >>>>>>>>> Users who relied on acks > 1 are expected to switch to using >>> >> acks = >>> >> >>>>>>>>> -1 >>> >> >>>>>>>>> and a min.isr parameter than matches their user case. >>> >> >>>>>>>>> >>> >> >>>>>>>>> This change was discussed in the past in the context of >>> >> KAFKA-1555 >>> >> >>>>>>>>> (min.isr), but let us know if you have any questions or >>> concerns >>> >> >>>>>>>>> regarding this change. >>> >> >>>>>>>>> >>> >> >>>>>>>>> Gwen >>> >> >>>>>>>> >>> >> >>>>>>>> >>> >> >>>>>>>> >>> >> >>>>>>>> -- >>> >> >>>>>>>> Thanks, >>> >> >>>>>>>> Ewen >>> >> >>>>>>> >>> >> >>>>>>> >>> >> >>>>>>> -- >>> >> >>>>>>> You received this message because you are subscribed to the >>> Google >>> >> >>>>>>> Groups >>> >> >>>>>>> "kafka-clients" group. >>> >> >>>>>>> To unsubscribe from this group and stop receiving emails from >>> it, >>> >> send >>> >> >>>>>>> an >>> >> >>>>>>> email to kafka-clients+unsubscr...@googlegroups.com. >>> >> >>>>>>> To post to this group, send email to >>> >> kafka-clie...@googlegroups.com. >>> >> >>>>>>> Visit this group at >>> http://groups.google.com/group/kafka-clients. >>> >> >>>>>>> To view this discussion on the web visit >>> >> >>>>>> >>> >> >>>>>> >>> >> >>> https://groups.google.com/d/msgid/kafka-clients/CAA7ooCBtH2JjyQsArdx_%3DV25B4O1QJk0YvOu9U6kYt9sB4aqng%40mail.gmail.com >>> >> >>>>>> . >>> >> >>>>>>> >>> >> >>>>>>> For more options, visit https://groups.google.com/d/optout. >>> >> >>>>>> >>> >> >>>>>> -- >>> >> >>>>>> You received this message because you are subscribed to the >>> Google >>> >> >>>>>> Groups >>> >> >>>>>> "kafka-clients" group. >>> >> >>>>>> To unsubscribe from this group and stop receiving emails from it, >>> >> send >>> >> >>>>>> an >>> >> >>>>>> email to kafka-clients+unsubscr...@googlegroups.com. >>> >> >>>>>> To post to this group, send email to >>> >> kafka-clie...@googlegroups.com. >>> >> >>>>>> Visit this group at http://groups.google.com/group/kafka-clients >>> . >>> >> >>>>>> To view this discussion on the web visit >>> >> >>>>>> >>> >> >>>>>> >>> >> >>> https://groups.google.com/d/msgid/kafka-clients/CAHBV8WeUebxi%2B%2BSbjz8E9Yf4u4hkcPJ80Xsj0XTKcTac%3D%2B613A%40mail.gmail.com >>> >> >>>>>> . >>> >> >>>>>> For more options, visit https://groups.google.com/d/optout. >>> >> >>>> >>> >> >>>> >>> >> >>>> >>> >> >>>> >>> >> >>>> -- >>> >> >>>> Thanks, >>> >> >>>> Ewen >>> >> >>> > >>> > >>> >>> >>> -- >>> Thanks, >>> Ewen >>> >> >> >> >> -- >> Thanks, >> Neha