+1 on Joe's suggestions, glad to see it happening! On Thu, Jan 15, 2015 at 5:19 PM, Gwen Shapira <gshap...@cloudera.com> wrote:
> The errors are part of the KIP process now, so I think the clients are > safe :) > > > https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals > > On Thu, Jan 15, 2015 at 5:12 PM, Steve Morin <steve.mo...@gmail.com> > wrote: > > Agree errors should be part of the protocol > > > >> On Jan 15, 2015, at 17:59, Gwen Shapira <gshap...@cloudera.com> wrote: > >> > >> Hi, > >> > >> I got convinced by Joe and Dana that errors are indeed part of the > >> protocol and can't be randomly added. > >> > >> So, it looks like we need to bump version of ProduceRequest in the > >> following way: > >> Version 0 -> accept acks >1. I think we should keep the existing > >> behavior too (i.e. not replace it with -1) to avoid surprising > >> clients, but I'm willing to hear other opinions. > >> Version 1 -> do not accept acks >1 and return an error. > >> Are we ok with the error I added in KAFKA-1697? We can use something > >> less specific like InvalidRequestParameter. This error can be reused > >> in the future and reduce the need to add errors, but will also be less > >> clear to the client and its users. Maybe even add the error message > >> string to the protocol in addition to the error code? (since we are > >> bumping versions....) > >> > >> I think maintaining the old version throughout 0.8.X makes sense. IMO > >> dropping it for 0.9 is feasible, but I'll let client owners help make > >> that call. > >> > >> Am I missing anything? Should I start a KIP? It seems like a KIP-type > >> discussion :) > >> > >> Gwen > >> > >> > >> On Thu, Jan 15, 2015 at 2:31 PM, Ewen Cheslack-Postava > >> <e...@confluent.io> wrote: > >>> Gwen, > >>> > >>> I think the only option that wouldn't require a protocol version > change is > >>> the one where acks > 1 is converted to acks = -1 since it's the only > one > >>> that doesn't potentially break older clients. The protocol guide says > that > >>> the expected upgrade path is servers first, then clients, so old > clients, > >>> including non-java clients, that may be using acks > 1 should be able > to > >>> work with a new broker version. > >>> > >>> It's more work, but I think dealing with the protocol change is the > right > >>> thing to do since it eventually gets us to the behavior I think is > better -- > >>> the broker should reject requests with invalid values. I think Joe and > I > >>> were basically in agreement. In my mind the major piece missing from > his > >>> description is how long we're going to maintain his "case 0" behavior. > It's > >>> impractical to maintain old versions forever, but it sounds like there > >>> hasn't been a decision on how long to maintain them. Maybe that's > another > >>> item to add to KIPs -- protocol versions and behavior need to be > listed as > >>> deprecated and the earliest version in which they'll be removed should > be > >>> specified so users can understand which versions are guaranteed to be > >>> compatible, even if they're using (well-written) non-java clients. > >>> > >>> -Ewen > >>> > >>> > >>>> On Thu, Jan 15, 2015 at 12:52 PM, Dana Powers <dana.pow...@gmail.com> > wrote: > >>>> > >>>>> clients don't break on unknown errors > >>>> > >>>> maybe true for the official java clients, but I dont think the > assumption > >>>> holds true for community-maintained clients and users of those > clients. > >>>> kafka-python generally follows the fail-fast philosophy and raises an > >>>> exception on any unrecognized error code in any server response. in > this > >>>> case, kafka-python allows users to set their own required-acks policy > when > >>>> creating a producer instance. It is possible that users of > kafka-python > >>>> have deployed producer code that uses ack>1 -- perhaps in production > >>>> environments -- and for those users the new error code will crash > their > >>>> producer code. I would not be surprised if the same were true of > other > >>>> community clients. > >>>> > >>>> *one reason for the fail-fast approach is that there isn't great > >>>> documentation on what errors to expect for each request / response -- > so > >>>> we > >>>> use failures to alert that some error case is not handled properly. > and > >>>> because of that, introducing new error cases without bumping the api > >>>> version is likely to cause those errors to get raised/thrown all the > way > >>>> back up to the user. of course we (client maintainers) can fix the > issues > >>>> in the client libraries and suggest users upgrade, but it's not the > ideal > >>>> situation. > >>>> > >>>> > >>>> long-winded way of saying: I agree w/ Joe. > >>>> > >>>> -Dana > >>>> > >>>> > >>>> On Thu, Jan 15, 2015 at 12:07 PM, Gwen Shapira <gshap...@cloudera.com > > > >>>> wrote: > >>>> > >>>>> Is the protocol bump caused by the behavior change or the new error > >>>>> code? > >>>>> > >>>>> 1) IMO, error_codes are data, and clients can expect to receive > errors > >>>>> that they don't understand (i.e. unknown errors). AFAIK, clients > don't > >>>>> break on unknown errors, they are simple more challenging to debug. > If > >>>>> we document the new behavior, then its definitely debuggable and > >>>>> fixable. > >>>>> > >>>>> 2) The behavior change is basically a deprecation - i.e. acks > 1 > were > >>>>> never documented, and are not supported by Kafka clients starting > with > >>>>> version 0.8.2. I'm not sure this requires a protocol bump either, > >>>>> although its a better case than new error codes. > >>>>> > >>>>> Thanks, > >>>>> Gwen > >>>>> > >>>>> On Thu, Jan 15, 2015 at 10:10 AM, Joe Stein <joe.st...@stealth.ly> > >>>>> wrote: > >>>>>> Looping in the mailing list that the client developers live on > because > >>>>> they > >>>>>> are all not on dev (though they should be if they want to be helping > >>>>>> to > >>>>>> build the best client libraries they can). > >>>>>> > >>>>>> I whole hardily believe that we need to not break existing > >>>>>> functionality > >>>>> of > >>>>>> the client protocol, ever. > >>>>>> > >>>>>> There are many reasons for this and we have other threads on the > >>>>>> mailing > >>>>>> list where we are discussing that topic (no pun intended) that I > don't > >>>>> want > >>>>>> to re-hash here. > >>>>>> > >>>>>> If we change wire protocol functionality OR the binary format > (either) > >>>>>> we > >>>>>> must bump version AND treat version as a feature flag with backward > >>>>>> compatibility support until it is deprecated for some time for folks > >>>>>> to > >>>>> deal > >>>>>> with it. > >>>>>> > >>>>>> match version = { > >>>>>> case 0: keepDoingWhatWeWereDoing() > >>>>>> case 1: doNewStuff() > >>>>>> case 2: doEvenMoreNewStuff() > >>>>>> } > >>>>>> > >>>>>> has to be a practice we adopt imho ... I know feature flags can be > >>>>> construed > >>>>>> as "messy code" but I am eager to hear another (better? different?) > >>>>> solution > >>>>>> to this. > >>>>>> > >>>>>> If we don't do a feature flag like this specifically with this > change > >>>>> then > >>>>>> what happens is that someone upgrades their brokers with a rolling > >>>>> restart > >>>>>> in 0.8.3 and every single one of their producer requests start to > fail > >>>>> and > >>>>>> they have a major production outage. eeeek!!!! > >>>>>> > >>>>>> I do 100% agree that > 1 makes no sense and we *REALLY* need people > to > >>>>> start > >>>>>> using 0,1,-1 but we need to-do that in a way that is going to work > for > >>>>>> everyone. > >>>>>> > >>>>>> Old producers and consumers must keep working with new brokers and > if > >>>>>> we > >>>>> are > >>>>>> not going to support that then I am unclear what the use of > "version" > >>>>>> is > >>>>>> based on our original intentions of having it because of the > >>>>>> 0.7=>-0.8. > >>>>> We > >>>>>> said no more breaking changes when we did that. > >>>>>> > >>>>>> - Joe Stein > >>>>>> > >>>>>> On Thu, Jan 15, 2015 at 12:38 PM, Ewen Cheslack-Postava < > >>>>> e...@confluent.io> > >>>>>> wrote: > >>>>>>> > >>>>>>> Right, so this looks like it could create an issue similar to > what's > >>>>>>> currently being discussed in > >>>>>>> https://issues.apache.org/jira/browse/KAFKA-1649 where users now > get > >>>>>>> errors > >>>>>>> under conditions when they previously wouldn't. Old clients won't > >>>>>>> even > >>>>>>> know > >>>>>>> about the error code, so besides failing they won't even be able to > >>>>>>> log > >>>>>>> any > >>>>>>> meaningful error messages. > >>>>>>> > >>>>>>> I think there are two options for compatibility: > >>>>>>> > >>>>>>> 1. An alternative change is to remove the ack > 1 code, but > silently > >>>>>>> "upgrade" requests with acks > 1 to acks = -1. This isn't the same > as > >>>>>>> other > >>>>>>> changes to behavior since the interaction between the client and > >>>>>>> server > >>>>>>> remains the same, no error codes change, etc. The client might just > >>>>>>> see > >>>>>>> some increased latency since the message might need to be > replicated > >>>>>>> to > >>>>>>> more brokers than they requested. > >>>>>>> 2. Split this into two patches, one that bumps the protocol version > >>>>>>> on > >>>>>>> that > >>>>>>> message to include the new error code but maintains both old (now > >>>>>>> deprecated) and new behavior, then a second that would be applied > in > >>>>>>> a > >>>>>>> later release that removes the old protocol + code for handling > acks > >>>>> 1. > >>>>>>> > >>>>>>> 2 is probably the right thing to do. If we specify the release when > >>>>> we'll > >>>>>>> remove the deprecated protocol at the time of deprecation it makes > >>>>> things > >>>>>>> a > >>>>>>> lot easier for people writing non-java clients and could give users > >>>>> better > >>>>>>> predictability (e.g. if clients are at most 1 major release behind > >>>>>>> brokers, > >>>>>>> they'll remain compatible but possibly use deprecated features). > >>>>>>> > >>>>>>> > >>>>>>> On Wed, Jan 14, 2015 at 3:51 PM, Gwen Shapira < > gshap...@cloudera.com> > >>>>>>> wrote: > >>>>>>> > >>>>>>>> Hi Kafka Devs, > >>>>>>>> > >>>>>>>> We are working on KAFKA-1697 - remove code related to ack>1 on the > >>>>>>>> broker. Per Neha's suggestion, I'd like to give everyone a heads > up > >>>>>>>> on > >>>>>>>> what these changes mean. > >>>>>>>> > >>>>>>>> Once this patch is included, any produce requests that include > >>>>>>>> request.required.acks > 1 will result in an exception. This will > be > >>>>>>>> InvalidRequiredAcks in new versions (0.8.3 and up, I assume) and > >>>>>>>> UnknownException in existing versions (sorry, but I can't add > error > >>>>>>>> codes retroactively). > >>>>>>>> > >>>>>>>> This behavior is already enforced by 0.8.2 producers (sync and > >>>>>>>> new), > >>>>>>>> but we expect impact on users with older producers that relied on > >>>>>>>> acks > >>>>>>>>> 1 and external clients (i.e python, go, etc). > >>>>>>>> > >>>>>>>> Users who relied on acks > 1 are expected to switch to using acks > = > >>>>>>>> -1 > >>>>>>>> and a min.isr parameter than matches their user case. > >>>>>>>> > >>>>>>>> This change was discussed in the past in the context of KAFKA-1555 > >>>>>>>> (min.isr), but let us know if you have any questions or concerns > >>>>>>>> regarding this change. > >>>>>>>> > >>>>>>>> Gwen > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> -- > >>>>>>> Thanks, > >>>>>>> Ewen > >>>>>> > >>>>>> > >>>>>> -- > >>>>>> You received this message because you are subscribed to the Google > >>>>>> Groups > >>>>>> "kafka-clients" group. > >>>>>> To unsubscribe from this group and stop receiving emails from it, > send > >>>>>> an > >>>>>> email to kafka-clients+unsubscr...@googlegroups.com. > >>>>>> To post to this group, send email to kafka-clie...@googlegroups.com > . > >>>>>> Visit this group at http://groups.google.com/group/kafka-clients. > >>>>>> To view this discussion on the web visit > >>>>> > >>>>> > https://groups.google.com/d/msgid/kafka-clients/CAA7ooCBtH2JjyQsArdx_%3DV25B4O1QJk0YvOu9U6kYt9sB4aqng%40mail.gmail.com > >>>>> . > >>>>>> > >>>>>> For more options, visit https://groups.google.com/d/optout. > >>>>> > >>>>> -- > >>>>> You received this message because you are subscribed to the Google > >>>>> Groups > >>>>> "kafka-clients" group. > >>>>> To unsubscribe from this group and stop receiving emails from it, > send > >>>>> an > >>>>> email to kafka-clients+unsubscr...@googlegroups.com. > >>>>> To post to this group, send email to kafka-clie...@googlegroups.com. > >>>>> Visit this group at http://groups.google.com/group/kafka-clients. > >>>>> To view this discussion on the web visit > >>>>> > >>>>> > https://groups.google.com/d/msgid/kafka-clients/CAHBV8WeUebxi%2B%2BSbjz8E9Yf4u4hkcPJ80Xsj0XTKcTac%3D%2B613A%40mail.gmail.com > >>>>> . > >>>>> For more options, visit https://groups.google.com/d/optout. > >>> > >>> > >>> > >>> > >>> -- > >>> Thanks, > >>> Ewen > -- -- Guozhang