I updated the KIP: Using acks > 1 in version 0 will log a WARN message
in the broker about client using deprecated behavior (suggested by Joe
in the JIRA, and I think it makes sense).

Gwen

On Fri, Jan 16, 2015 at 10:40 AM, Gwen Shapira <gshap...@cloudera.com> wrote:
> How about we continue the discussion on this thread, so we won't lose
> the context of this discussion, and put it up for VOTE when this has
> been finalized?
>
> On Fri, Jan 16, 2015 at 10:22 AM, Neha Narkhede <n...@confluent.io> wrote:
>> Gwen,
>>
>> KIP write-up looks good. According to the rest of the KIP process proposal,
>> would you like to start a DISCUSS/VOTE thread for it?
>>
>> Thanks,
>> Neha
>>
>> On Fri, Jan 16, 2015 at 9:37 AM, Ewen Cheslack-Postava <e...@confluent.io>
>> wrote:
>>
>>> Gwen -- KIP write up looks good. Deprecation schedule probably needs to be
>>> more specific, but I think that discussion probably needs to happen after a
>>> solution is agreed upon.
>>>
>>> Jay -- I think "older clients will get a bad error message instead of a
>>> good one" isn't what would be happening with this change. Previously they
>>> wouldn't have received an error and they would have been able to produce
>>> messages. After the change they'll just receive this new error message
>>> which their clients can't possibly handle gracefully since it didn't exist
>>> when the client was written. Whether the acks > 1 setting was actually
>>> accomplishing what they thought doesn't matter. Someone could have
>>> reasonably read the docs on 0.8.1.1, thought acks = 2 is an ok setting for
>>> their applications, set it as a default across a bunch of apps, then follow
>>> the recommended upgrade path of updating brokers to 0.8.2 and all their
>>> apps will start failing on produce requests.
>>>
>>>
>>> On Thu, Jan 15, 2015 at 8:27 PM, Jay Kreps <jay.kr...@gmail.com> wrote:
>>>
>>> > This is a good case to discuss.
>>> >
>>> > Let's figure the general case of how we want to handle errors and get
>>> that
>>> > documented in the protocol. The problem right now is that we give no
>>> > guidance on this. I actually thought Gwen's suggestion made sense on the
>>> > guidance we should have given which is that we will enumerate a set of
>>> > errors and their meaning for each API but it is possible that other
>>> errors
>>> > will occur and they should be handled (maybe poorly) in the same way
>>> > UNKNOWN_ERROR is handled which is our normal escape hatch for things like
>>> > OOMException.
>>> >
>>> > I really do think we shouldn't be dogmatic here: In considering a change
>>> > to errors we should consider the potential ill-effect vs the complexity
>>> of
>>> > yet another protocol version.
>>> >
>>> > In this case I actually am not sure we need to bump the protocol because
>>> > the whole point of the change was to make a setting we think doesn't make
>>> > sense break, right? Well this will break it. It seems like the only
>>> > downside is that older clients will get a bad error message instead of a
>>> > good one. But it isn't like we will have rendered a client unusable, it
>>> is
>>> > just that they will need to change their config.
>>> >
>>> > -Jay
>>> >
>>> > On Thu, Jan 15, 2015 at 6:14 PM, Gwen Shapira <gshap...@cloudera.com>
>>> > wrote:
>>> >
>>> >> I created a KIP for this suggestion:
>>> >>
>>> >>
>>> >>
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1+-+Remove+support+of+request.required.acks
>>> >>
>>> >> Basically documenting what was already discussed here.  Comments will
>>> >> be awesome!
>>> >>
>>> >> Gwen
>>> >>
>>> >> On Thu, Jan 15, 2015 at 5:19 PM, Gwen Shapira <gshap...@cloudera.com>
>>> >> wrote:
>>> >> > The errors are part of the KIP process now, so I think the clients are
>>> >> safe :)
>>> >> >
>>> >> >
>>> >>
>>> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals
>>> >> >
>>> >> > On Thu, Jan 15, 2015 at 5:12 PM, Steve Morin <steve.mo...@gmail.com>
>>> >> wrote:
>>> >> >> Agree errors should be part of the protocol
>>> >> >>
>>> >> >>> On Jan 15, 2015, at 17:59, Gwen Shapira <gshap...@cloudera.com>
>>> >> wrote:
>>> >> >>>
>>> >> >>> Hi,
>>> >> >>>
>>> >> >>> I got convinced by Joe and Dana that errors are indeed part of the
>>> >> >>> protocol and can't be randomly added.
>>> >> >>>
>>> >> >>> So, it looks like we need to bump version of ProduceRequest in the
>>> >> >>> following way:
>>> >> >>> Version 0 -> accept acks >1. I think we should keep the existing
>>> >> >>> behavior too (i.e. not replace it with -1) to avoid surprising
>>> >> >>> clients, but I'm willing to hear other opinions.
>>> >> >>> Version 1 -> do not accept acks >1 and return an error.
>>> >> >>> Are we ok with the error I added in KAFKA-1697? We can use something
>>> >> >>> less specific like InvalidRequestParameter. This error can be reused
>>> >> >>> in the future and reduce the need to add errors, but will also be
>>> less
>>> >> >>> clear to the client and its users. Maybe even add the error message
>>> >> >>> string to the protocol in addition to the error code? (since we are
>>> >> >>> bumping versions....)
>>> >> >>>
>>> >> >>> I think maintaining the old version throughout 0.8.X makes sense.
>>> IMO
>>> >> >>> dropping it for 0.9 is feasible, but I'll let client owners help
>>> make
>>> >> >>> that call.
>>> >> >>>
>>> >> >>> Am I missing anything? Should I start a KIP? It seems like a
>>> KIP-type
>>> >> >>> discussion :)
>>> >> >>>
>>> >> >>> Gwen
>>> >> >>>
>>> >> >>>
>>> >> >>> On Thu, Jan 15, 2015 at 2:31 PM, Ewen Cheslack-Postava
>>> >> >>> <e...@confluent.io> wrote:
>>> >> >>>> Gwen,
>>> >> >>>>
>>> >> >>>> I think the only option that wouldn't require a protocol version
>>> >> change is
>>> >> >>>> the one where acks > 1 is converted to acks = -1 since it's the
>>> only
>>> >> one
>>> >> >>>> that doesn't potentially break older clients. The protocol guide
>>> >> says that
>>> >> >>>> the expected upgrade path is servers first, then clients, so old
>>> >> clients,
>>> >> >>>> including non-java clients, that may be using acks > 1 should be
>>> >> able to
>>> >> >>>> work with a new broker version.
>>> >> >>>>
>>> >> >>>> It's more work, but I think dealing with the protocol change is the
>>> >> right
>>> >> >>>> thing to do since it eventually gets us to the behavior I think is
>>> >> better --
>>> >> >>>> the broker should reject requests with invalid values. I think Joe
>>> >> and I
>>> >> >>>> were basically in agreement. In my mind the major piece missing
>>> from
>>> >> his
>>> >> >>>> description is how long we're going to maintain his "case 0"
>>> >> behavior. It's
>>> >> >>>> impractical to maintain old versions forever, but it sounds like
>>> >> there
>>> >> >>>> hasn't been a decision on how long to maintain them. Maybe that's
>>> >> another
>>> >> >>>> item to add to KIPs -- protocol versions and behavior need to be
>>> >> listed as
>>> >> >>>> deprecated and the earliest version in which they'll be removed
>>> >> should be
>>> >> >>>> specified so users can understand which versions are guaranteed to
>>> be
>>> >> >>>> compatible, even if they're using (well-written) non-java clients.
>>> >> >>>>
>>> >> >>>> -Ewen
>>> >> >>>>
>>> >> >>>>
>>> >> >>>>> On Thu, Jan 15, 2015 at 12:52 PM, Dana Powers <
>>> >> dana.pow...@gmail.com> wrote:
>>> >> >>>>>
>>> >> >>>>>> clients don't break on unknown errors
>>> >> >>>>>
>>> >> >>>>> maybe true for the official java clients, but I dont think the
>>> >> assumption
>>> >> >>>>> holds true for community-maintained clients and users of those
>>> >> clients.
>>> >> >>>>> kafka-python generally follows the fail-fast philosophy and raises
>>> >> an
>>> >> >>>>> exception on any unrecognized error code in any server response.
>>> >> in this
>>> >> >>>>> case, kafka-python allows users to set their own required-acks
>>> >> policy when
>>> >> >>>>> creating a producer instance.  It is possible that users of
>>> >> kafka-python
>>> >> >>>>> have deployed producer code that uses ack>1 -- perhaps in
>>> production
>>> >> >>>>> environments -- and for those users the new error code will crash
>>> >> their
>>> >> >>>>> producer code.  I would not be surprised if the same were true of
>>> >> other
>>> >> >>>>> community clients.
>>> >> >>>>>
>>> >> >>>>> *one reason for the fail-fast approach is that there isn't great
>>> >> >>>>> documentation on what errors to expect for each request / response
>>> >> -- so
>>> >> >>>>> we
>>> >> >>>>> use failures to alert that some error case is not handled
>>> >> properly.  and
>>> >> >>>>> because of that, introducing new error cases without bumping the
>>> api
>>> >> >>>>> version is likely to cause those errors to get raised/thrown all
>>> >> the way
>>> >> >>>>> back up to the user.  of course we (client maintainers) can fix
>>> the
>>> >> issues
>>> >> >>>>> in the client libraries and suggest users upgrade, but it's not
>>> the
>>> >> ideal
>>> >> >>>>> situation.
>>> >> >>>>>
>>> >> >>>>>
>>> >> >>>>> long-winded way of saying: I agree w/ Joe.
>>> >> >>>>>
>>> >> >>>>> -Dana
>>> >> >>>>>
>>> >> >>>>>
>>> >> >>>>> On Thu, Jan 15, 2015 at 12:07 PM, Gwen Shapira <
>>> >> gshap...@cloudera.com>
>>> >> >>>>> wrote:
>>> >> >>>>>
>>> >> >>>>>> Is the protocol bump caused by the behavior change or the new
>>> error
>>> >> >>>>>> code?
>>> >> >>>>>>
>>> >> >>>>>> 1) IMO, error_codes are data, and clients can expect to receive
>>> >> errors
>>> >> >>>>>> that they don't understand (i.e. unknown errors). AFAIK, clients
>>> >> don't
>>> >> >>>>>> break on unknown errors, they are simple more challenging to
>>> >> debug. If
>>> >> >>>>>> we document the new behavior, then its definitely debuggable and
>>> >> >>>>>> fixable.
>>> >> >>>>>>
>>> >> >>>>>> 2) The behavior change is basically a deprecation - i.e. acks > 1
>>> >> were
>>> >> >>>>>> never documented, and are not supported by Kafka clients starting
>>> >> with
>>> >> >>>>>> version 0.8.2. I'm not sure this requires a protocol bump either,
>>> >> >>>>>> although its a better case than new error codes.
>>> >> >>>>>>
>>> >> >>>>>> Thanks,
>>> >> >>>>>> Gwen
>>> >> >>>>>>
>>> >> >>>>>> On Thu, Jan 15, 2015 at 10:10 AM, Joe Stein <
>>> joe.st...@stealth.ly>
>>> >> >>>>>> wrote:
>>> >> >>>>>>> Looping in the mailing list that the client developers live on
>>> >> because
>>> >> >>>>>> they
>>> >> >>>>>>> are all not on dev (though they should be if they want to be
>>> >> helping
>>> >> >>>>>>> to
>>> >> >>>>>>> build the best client libraries they can).
>>> >> >>>>>>>
>>> >> >>>>>>> I whole hardily believe that we need to not break existing
>>> >> >>>>>>> functionality
>>> >> >>>>>> of
>>> >> >>>>>>> the client protocol, ever.
>>> >> >>>>>>>
>>> >> >>>>>>> There are many reasons for this and we have other threads on the
>>> >> >>>>>>> mailing
>>> >> >>>>>>> list where we are discussing that topic (no pun intended) that I
>>> >> don't
>>> >> >>>>>> want
>>> >> >>>>>>> to re-hash here.
>>> >> >>>>>>>
>>> >> >>>>>>> If we change wire protocol functionality OR the binary format
>>> >> (either)
>>> >> >>>>>>> we
>>> >> >>>>>>> must bump version AND treat version as a feature flag with
>>> >> backward
>>> >> >>>>>>> compatibility support until it is deprecated for some time for
>>> >> folks
>>> >> >>>>>>> to
>>> >> >>>>>> deal
>>> >> >>>>>>> with it.
>>> >> >>>>>>>
>>> >> >>>>>>> match version = {
>>> >> >>>>>>> case 0: keepDoingWhatWeWereDoing()
>>> >> >>>>>>> case 1: doNewStuff()
>>> >> >>>>>>> case 2: doEvenMoreNewStuff()
>>> >> >>>>>>> }
>>> >> >>>>>>>
>>> >> >>>>>>> has to be a practice we adopt imho ... I know feature flags can
>>> be
>>> >> >>>>>> construed
>>> >> >>>>>>> as "messy code" but I am eager to hear another (better?
>>> >> different?)
>>> >> >>>>>> solution
>>> >> >>>>>>> to this.
>>> >> >>>>>>>
>>> >> >>>>>>> If we don't do a feature flag like this specifically with this
>>> >> change
>>> >> >>>>>> then
>>> >> >>>>>>> what happens is that someone upgrades their brokers with a
>>> rolling
>>> >> >>>>>> restart
>>> >> >>>>>>> in 0.8.3 and every single one of their producer requests start
>>> to
>>> >> fail
>>> >> >>>>>> and
>>> >> >>>>>>> they have a major production outage. eeeek!!!!
>>> >> >>>>>>>
>>> >> >>>>>>> I do 100% agree that > 1 makes no sense and we *REALLY* need
>>> >> people to
>>> >> >>>>>> start
>>> >> >>>>>>> using 0,1,-1 but we need to-do that in a way that is going to
>>> >> work for
>>> >> >>>>>>> everyone.
>>> >> >>>>>>>
>>> >> >>>>>>> Old producers and consumers must keep working with new brokers
>>> >> and if
>>> >> >>>>>>> we
>>> >> >>>>>> are
>>> >> >>>>>>> not going to support that then I am unclear what the use of
>>> >> "version"
>>> >> >>>>>>> is
>>> >> >>>>>>> based on our original intentions of having it because of the
>>> >> >>>>>>> 0.7=>-0.8.
>>> >> >>>>>> We
>>> >> >>>>>>> said no more breaking changes when we did that.
>>> >> >>>>>>>
>>> >> >>>>>>> - Joe Stein
>>> >> >>>>>>>
>>> >> >>>>>>> On Thu, Jan 15, 2015 at 12:38 PM, Ewen Cheslack-Postava <
>>> >> >>>>>> e...@confluent.io>
>>> >> >>>>>>> wrote:
>>> >> >>>>>>>>
>>> >> >>>>>>>> Right, so this looks like it could create an issue similar to
>>> >> what's
>>> >> >>>>>>>> currently being discussed in
>>> >> >>>>>>>> https://issues.apache.org/jira/browse/KAFKA-1649 where users
>>> >> now get
>>> >> >>>>>>>> errors
>>> >> >>>>>>>> under conditions when they previously wouldn't. Old clients
>>> won't
>>> >> >>>>>>>> even
>>> >> >>>>>>>> know
>>> >> >>>>>>>> about the error code, so besides failing they won't even be
>>> able
>>> >> to
>>> >> >>>>>>>> log
>>> >> >>>>>>>> any
>>> >> >>>>>>>> meaningful error messages.
>>> >> >>>>>>>>
>>> >> >>>>>>>> I think there are two options for compatibility:
>>> >> >>>>>>>>
>>> >> >>>>>>>> 1. An alternative change is to remove the ack > 1 code, but
>>> >> silently
>>> >> >>>>>>>> "upgrade" requests with acks > 1 to acks = -1. This isn't the
>>> >> same as
>>> >> >>>>>>>> other
>>> >> >>>>>>>> changes to behavior since the interaction between the client
>>> and
>>> >> >>>>>>>> server
>>> >> >>>>>>>> remains the same, no error codes change, etc. The client might
>>> >> just
>>> >> >>>>>>>> see
>>> >> >>>>>>>> some increased latency since the message might need to be
>>> >> replicated
>>> >> >>>>>>>> to
>>> >> >>>>>>>> more brokers than they requested.
>>> >> >>>>>>>> 2. Split this into two patches, one that bumps the protocol
>>> >> version
>>> >> >>>>>>>> on
>>> >> >>>>>>>> that
>>> >> >>>>>>>> message to include the new error code but maintains both old
>>> (now
>>> >> >>>>>>>> deprecated) and new behavior, then a second that would be
>>> >> applied in
>>> >> >>>>>>>> a
>>> >> >>>>>>>> later release that removes the old protocol + code for handling
>>> >> acks
>>> >> >>>>>> 1.
>>> >> >>>>>>>>
>>> >> >>>>>>>> 2 is probably the right thing to do. If we specify the release
>>> >> when
>>> >> >>>>>> we'll
>>> >> >>>>>>>> remove the deprecated protocol at the time of deprecation it
>>> >> makes
>>> >> >>>>>> things
>>> >> >>>>>>>> a
>>> >> >>>>>>>> lot easier for people writing non-java clients and could give
>>> >> users
>>> >> >>>>>> better
>>> >> >>>>>>>> predictability (e.g. if clients are at most 1 major release
>>> >> behind
>>> >> >>>>>>>> brokers,
>>> >> >>>>>>>> they'll remain compatible but possibly use deprecated
>>> features).
>>> >> >>>>>>>>
>>> >> >>>>>>>>
>>> >> >>>>>>>> On Wed, Jan 14, 2015 at 3:51 PM, Gwen Shapira <
>>> >> gshap...@cloudera.com>
>>> >> >>>>>>>> wrote:
>>> >> >>>>>>>>
>>> >> >>>>>>>>> Hi Kafka Devs,
>>> >> >>>>>>>>>
>>> >> >>>>>>>>> We are working on KAFKA-1697 - remove code related to ack>1 on
>>> >> the
>>> >> >>>>>>>>> broker. Per Neha's suggestion, I'd like to give everyone a
>>> >> heads up
>>> >> >>>>>>>>> on
>>> >> >>>>>>>>> what these changes mean.
>>> >> >>>>>>>>>
>>> >> >>>>>>>>> Once this patch is included, any produce requests that include
>>> >> >>>>>>>>> request.required.acks > 1 will result in an exception. This
>>> >> will be
>>> >> >>>>>>>>> InvalidRequiredAcks in new versions (0.8.3 and up, I assume)
>>> and
>>> >> >>>>>>>>> UnknownException in existing versions (sorry, but I can't add
>>> >> error
>>> >> >>>>>>>>> codes retroactively).
>>> >> >>>>>>>>>
>>> >> >>>>>>>>> This behavior is already enforced by 0.8.2 producers (sync and
>>> >> >>>>>>>>> new),
>>> >> >>>>>>>>> but we expect impact on users with older producers that relied
>>> >> on
>>> >> >>>>>>>>> acks
>>> >> >>>>>>>>>> 1 and external clients (i.e python, go, etc).
>>> >> >>>>>>>>>
>>> >> >>>>>>>>> Users who relied on acks > 1 are expected to switch to using
>>> >> acks =
>>> >> >>>>>>>>> -1
>>> >> >>>>>>>>> and a min.isr parameter than matches their user case.
>>> >> >>>>>>>>>
>>> >> >>>>>>>>> This change was discussed in the past in the context of
>>> >> KAFKA-1555
>>> >> >>>>>>>>> (min.isr), but let us know if you have any questions or
>>> concerns
>>> >> >>>>>>>>> regarding this change.
>>> >> >>>>>>>>>
>>> >> >>>>>>>>> Gwen
>>> >> >>>>>>>>
>>> >> >>>>>>>>
>>> >> >>>>>>>>
>>> >> >>>>>>>> --
>>> >> >>>>>>>> Thanks,
>>> >> >>>>>>>> Ewen
>>> >> >>>>>>>
>>> >> >>>>>>>
>>> >> >>>>>>> --
>>> >> >>>>>>> You received this message because you are subscribed to the
>>> Google
>>> >> >>>>>>> Groups
>>> >> >>>>>>> "kafka-clients" group.
>>> >> >>>>>>> To unsubscribe from this group and stop receiving emails from
>>> it,
>>> >> send
>>> >> >>>>>>> an
>>> >> >>>>>>> email to kafka-clients+unsubscr...@googlegroups.com.
>>> >> >>>>>>> To post to this group, send email to
>>> >> kafka-clie...@googlegroups.com.
>>> >> >>>>>>> Visit this group at
>>> http://groups.google.com/group/kafka-clients.
>>> >> >>>>>>> To view this discussion on the web visit
>>> >> >>>>>>
>>> >> >>>>>>
>>> >>
>>> https://groups.google.com/d/msgid/kafka-clients/CAA7ooCBtH2JjyQsArdx_%3DV25B4O1QJk0YvOu9U6kYt9sB4aqng%40mail.gmail.com
>>> >> >>>>>> .
>>> >> >>>>>>>
>>> >> >>>>>>> For more options, visit https://groups.google.com/d/optout.
>>> >> >>>>>>
>>> >> >>>>>> --
>>> >> >>>>>> You received this message because you are subscribed to the
>>> Google
>>> >> >>>>>> Groups
>>> >> >>>>>> "kafka-clients" group.
>>> >> >>>>>> To unsubscribe from this group and stop receiving emails from it,
>>> >> send
>>> >> >>>>>> an
>>> >> >>>>>> email to kafka-clients+unsubscr...@googlegroups.com.
>>> >> >>>>>> To post to this group, send email to
>>> >> kafka-clie...@googlegroups.com.
>>> >> >>>>>> Visit this group at http://groups.google.com/group/kafka-clients
>>> .
>>> >> >>>>>> To view this discussion on the web visit
>>> >> >>>>>>
>>> >> >>>>>>
>>> >>
>>> https://groups.google.com/d/msgid/kafka-clients/CAHBV8WeUebxi%2B%2BSbjz8E9Yf4u4hkcPJ80Xsj0XTKcTac%3D%2B613A%40mail.gmail.com
>>> >> >>>>>> .
>>> >> >>>>>> For more options, visit https://groups.google.com/d/optout.
>>> >> >>>>
>>> >> >>>>
>>> >> >>>>
>>> >> >>>>
>>> >> >>>> --
>>> >> >>>> Thanks,
>>> >> >>>> Ewen
>>> >>
>>> >
>>> >
>>>
>>>
>>> --
>>> Thanks,
>>> Ewen
>>>
>>
>>
>>
>> --
>> Thanks,
>> Neha

Reply via email to