Re: [kafka-clients] Re: Heads up: KAFKA-1697 - remove code related to ack>1 on the broker

Gwen Shapira Mon, 19 Jan 2015 10:02:17 -0800

Sounds good to me.
I'll open a new JIRA for 0.8.2 with just an extra log warning, to
avoid making KAFKA-1697 any more confusing.


On Mon, Jan 19, 2015 at 9:46 AM, Joe Stein <[email protected]> wrote:
> << For 2, how about we make a change to log a warning for ack > 1 in 0.8.2
> and then drop the ack > 1 support in trunk (w/o bumping up the protocol
> version)?
>
> +1
>
>
> On Mon, Jan 19, 2015 at 12:35 PM, Jun Rao <[email protected]> wrote:
>>
>> For 2, how about we make a change to log a warning for ack > 1 in 0.8.2
>> and then drop the ack > 1 support in trunk (w/o bumping up the protocol
>> version)? Thanks,
>>
>> Jun
>>
>> On Sun, Jan 18, 2015 at 8:24 PM, Gwen Shapira <[email protected]>
>> wrote:
>>>
>>> Overall, agree on point #1, less sure on point #2.
>>>
>>> 1. Some protocols never ever add new errors, while others add errors
>>> without bumping versions. HTTP is a good example of the second type.
>>> HTTP-451 was added fairly recently, there are some errors specific to
>>> NGINX, etc. No one cares. I think we should properly document in the
>>> wire-protocol doc that new errors can be added, and I think we should
>>> strongly suggest (and implement ourselves) that unknown error codes
>>> should be shown to users (or at least logged), so they can be googled
>>> and understood through our documentation.
>>> In addition, hierarchy of error codes, so clients will know if an
>>> error is retry-able just by looking at the code could be nice. Same
>>> for adding an error string to the protocol. These are future
>>> enhancements that should be discussed separately.
>>>
>>> 2. I think we want to allow admins to upgrade their Kafka brokers
>>> without having to chase down clients in their organization and without
>>> getting blamed if clients break. I think it makes sense to have one
>>> version that will support existing behavior, but log warnings, so
>>> admins will know about misbehaving clients and can track them down
>>> before an upgrade that breaks them (or before the broken config causes
>>> them to lose data!). Hopefully this is indeed a very rare behavior and
>>> we are taking extra precaution for nothing, but I have customers where
>>> one traumatic upgrade means they will never upgrade a Kafka again, so
>>> I'm being conservative.
>>>
>>> Gwen
>>>
>>>
>>> On Sun, Jan 18, 2015 at 3:50 PM, Jun Rao <[email protected]> wrote:
>>> > Overall, I agree with Jay on both points.
>>> >
>>> > 1. I think it's reasonable to add new error codes w/o bumping up the
>>> > protocol version. In most cases, by adding new error codes, we are just
>>> > refining the categorization of those unknown errors. So, a client
>>> > shouldn't
>>> > behave worse than before as long as unknown errors have been properly
>>> > handled.
>>> >
>>> > 2. I think it's reasonable to just document that 0.8.2 will be the last
>>> > release that will support ack > 1 and remove the support completely in
>>> > trunk
>>> > w/o bumping up the protocol. This is because (a) we never included ack
>>> > > 1
>>> > explicitly in the documentation and so the usage should be limited; (2)
>>> > ack
>>> >> 1 doesn't provide the guarantee that people really want and so it
>>> > shouldn't really be used.
>>> >
>>> > Thanks,
>>> >
>>> > Jun
>>> >
>>> >
>>> > On Sun, Jan 18, 2015 at 11:03 AM, Jay Kreps <[email protected]>
>>> > wrote:
>>> >>
>>> >> Hey guys,
>>> >>
>>> >> I really think we are discussing two things here:
>>> >>
>>> >> How should we generally handle changes to the set of errors? Should
>>> >> introducing new errors be considered a protocol change or should we
>>> >> reserve
>>> >> the right to introduce new error codes?
>>> >> Given that this particular change is possibly incompatible, how should
>>> >> we
>>> >> handle it?
>>> >>
>>> >> I think it would be good for people who are responding here to be
>>> >> specific
>>> >> about which they are addressing.
>>> >>
>>> >> Here is what I think:
>>> >>
>>> >> 1. Errors should be extensible within a protocol version.
>>> >>
>>> >> We should change the protocol documentation to list the errors that
>>> >> can be
>>> >> given back from each api, their meaning, and how to handle them, BUT
>>> >> we
>>> >> should explicitly state that the set of errors are open ended. That is
>>> >> we
>>> >> should reserve the right to introduce new errors and explicitly state
>>> >> that
>>> >> clients need a blanket "unknown error" handling mechanism. The error
>>> >> can
>>> >> link to the protocol definition (something like "Unknown error 42, see
>>> >> protocol definition at http://link";). We could make this work really
>>> >> well by
>>> >> instructing all the clients to report the error in a very googlable
>>> >> way as
>>> >> Oracle does with their error format (e.g. "ORA-32") so that if you
>>> >> ever get
>>> >> the raw error google will take you to the definition.
>>> >>
>>> >> I agree that a more rigid definition seems like "right thing", but
>>> >> having
>>> >> just implemented two clients and spent a bunch of time on the server
>>> >> side, I
>>> >> think, it will work out poorly in practice. Here is why:
>>> >>
>>> >> I think we will make a lot of mistakes in nailing down the set of
>>> >> error
>>> >> codes up front and we will end up going through 3-4 churns of the
>>> >> protocol
>>> >> definition just realizing the set of errors that can be thrown. I
>>> >> think this
>>> >> churn will actually make life worse for clients that now have to
>>> >> figure out
>>> >> 7 identical versions of the protocol and will be a mess in terms of
>>> >> testing
>>> >> on the server side. I actually know this to be true because while
>>> >> implementing the clients I tried to guess the errors that could be
>>> >> thrown,
>>> >> then checked my guess by close code inspection. It turned out that I
>>> >> always
>>> >> missed things in my belief about errors, but more importantly even
>>> >> after
>>> >> close code inspection I found tons of other errors in my stress
>>> >> testing.
>>> >> In practice error handling always involves calling out one or two
>>> >> meaningful failures that have special recovery and then a blanket case
>>> >> that
>>> >> just handles everything else. It's true that some clients may not have
>>> >> done
>>> >> this well, but I think it is for the best if they fix that.
>>> >> Reserving the right to add errors doesn't mean we will do it without
>>> >> care.
>>> >> We will think through each change and decide whether giving a little
>>> >> more
>>> >> precision in the error is worth the overhead and churn of a protocol
>>> >> version
>>> >> bump.
>>> >>
>>> >> 2. In this case in particular we should not introduce a new protocol
>>> >> version
>>> >>
>>> >> In this particular case we are saying that acks > 1 doesn't make sense
>>> >> and
>>> >> we want to give an error to people specifying this so that they change
>>> >> their
>>> >> configuration. This is a configuration that few people use and we want
>>> >> to
>>> >> just make it an error. The bad behavior will just be that the error
>>> >> will not
>>> >> be as good as it could be. I think that is a better tradeoff than
>>> >> introducing a separate protocol version (this may be true of the java
>>> >> clients too).
>>> >>
>>> >> We will have lots of cases like this in the future and we aren't going
>>> >> to
>>> >> want to churn the protocol for each of them. For example we previously
>>> >> had
>>> >> to get more precise about which characters were legal and which
>>> >> illegal in
>>> >> topic names.
>>> >>
>>> >> -Jay
>>> >>
>>> >> On Fri, Jan 16, 2015 at 11:55 AM, Gwen Shapira <[email protected]>
>>> >> wrote:
>>> >>>
>>> >>> I updated the KIP: Using acks > 1 in version 0 will log a WARN
>>> >>> message
>>> >>> in the broker about client using deprecated behavior (suggested by
>>> >>> Joe
>>> >>> in the JIRA, and I think it makes sense).
>>> >>>
>>> >>> Gwen
>>> >>>
>>> >>> On Fri, Jan 16, 2015 at 10:40 AM, Gwen Shapira
>>> >>> <[email protected]>
>>> >>> wrote:
>>> >>> > How about we continue the discussion on this thread, so we won't
>>> >>> > lose
>>> >>> > the context of this discussion, and put it up for VOTE when this
>>> >>> > has
>>> >>> > been finalized?
>>> >>> >
>>> >>> > On Fri, Jan 16, 2015 at 10:22 AM, Neha Narkhede <[email protected]>
>>> >>> > wrote:
>>> >>> >> Gwen,
>>> >>> >>
>>> >>> >> KIP write-up looks good. According to the rest of the KIP process
>>> >>> >> proposal,
>>> >>> >> would you like to start a DISCUSS/VOTE thread for it?
>>> >>> >>
>>> >>> >> Thanks,
>>> >>> >> Neha
>>> >>> >>
>>> >>> >> On Fri, Jan 16, 2015 at 9:37 AM, Ewen Cheslack-Postava
>>> >>> >> <[email protected]>
>>> >>> >> wrote:
>>> >>> >>
>>> >>> >>> Gwen -- KIP write up looks good. Deprecation schedule probably
>>> >>> >>> needs
>>> >>> >>> to be
>>> >>> >>> more specific, but I think that discussion probably needs to
>>> >>> >>> happen
>>> >>> >>> after a
>>> >>> >>> solution is agreed upon.
>>> >>> >>>
>>> >>> >>> Jay -- I think "older clients will get a bad error message
>>> >>> >>> instead of
>>> >>> >>> a
>>> >>> >>> good one" isn't what would be happening with this change.
>>> >>> >>> Previously
>>> >>> >>> they
>>> >>> >>> wouldn't have received an error and they would have been able to
>>> >>> >>> produce
>>> >>> >>> messages. After the change they'll just receive this new error
>>> >>> >>> message
>>> >>> >>> which their clients can't possibly handle gracefully since it
>>> >>> >>> didn't
>>> >>> >>> exist
>>> >>> >>> when the client was written. Whether the acks > 1 setting was
>>> >>> >>> actually
>>> >>> >>> accomplishing what they thought doesn't matter. Someone could
>>> >>> >>> have
>>> >>> >>> reasonably read the docs on 0.8.1.1, thought acks = 2 is an ok
>>> >>> >>> setting for
>>> >>> >>> their applications, set it as a default across a bunch of apps,
>>> >>> >>> then
>>> >>> >>> follow
>>> >>> >>> the recommended upgrade path of updating brokers to 0.8.2 and all
>>> >>> >>> their
>>> >>> >>> apps will start failing on produce requests.
>>> >>> >>>
>>> >>> >>>
>>> >>> >>> On Thu, Jan 15, 2015 at 8:27 PM, Jay Kreps <[email protected]>
>>> >>> >>> wrote:
>>> >>> >>>
>>> >>> >>> > This is a good case to discuss.
>>> >>> >>> >
>>> >>> >>> > Let's figure the general case of how we want to handle errors
>>> >>> >>> > and
>>> >>> >>> > get
>>> >>> >>> that
>>> >>> >>> > documented in the protocol. The problem right now is that we
>>> >>> >>> > give
>>> >>> >>> > no
>>> >>> >>> > guidance on this. I actually thought Gwen's suggestion made
>>> >>> >>> > sense
>>> >>> >>> > on the
>>> >>> >>> > guidance we should have given which is that we will enumerate a
>>> >>> >>> > set
>>> >>> >>> > of
>>> >>> >>> > errors and their meaning for each API but it is possible that
>>> >>> >>> > other
>>> >>> >>> errors
>>> >>> >>> > will occur and they should be handled (maybe poorly) in the
>>> >>> >>> > same
>>> >>> >>> > way
>>> >>> >>> > UNKNOWN_ERROR is handled which is our normal escape hatch for
>>> >>> >>> > things like
>>> >>> >>> > OOMException.
>>> >>> >>> >
>>> >>> >>> > I really do think we shouldn't be dogmatic here: In considering
>>> >>> >>> > a
>>> >>> >>> > change
>>> >>> >>> > to errors we should consider the potential ill-effect vs the
>>> >>> >>> > complexity
>>> >>> >>> of
>>> >>> >>> > yet another protocol version.
>>> >>> >>> >
>>> >>> >>> > In this case I actually am not sure we need to bump the
>>> >>> >>> > protocol
>>> >>> >>> > because
>>> >>> >>> > the whole point of the change was to make a setting we think
>>> >>> >>> > doesn't make
>>> >>> >>> > sense break, right? Well this will break it. It seems like the
>>> >>> >>> > only
>>> >>> >>> > downside is that older clients will get a bad error message
>>> >>> >>> > instead
>>> >>> >>> > of a
>>> >>> >>> > good one. But it isn't like we will have rendered a client
>>> >>> >>> > unusable, it
>>> >>> >>> is
>>> >>> >>> > just that they will need to change their config.
>>> >>> >>> >
>>> >>> >>> > -Jay
>>> >>> >>> >
>>> >>> >>> > On Thu, Jan 15, 2015 at 6:14 PM, Gwen Shapira
>>> >>> >>> > <[email protected]>
>>> >>> >>> > wrote:
>>> >>> >>> >
>>> >>> >>> >> I created a KIP for this suggestion:
>>> >>> >>> >>
>>> >>> >>> >>
>>> >>> >>> >>
>>> >>> >>>
>>> >>> >>>
>>> >>> >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1+-+Remove+support+of+request.required.acks
>>> >>> >>> >>
>>> >>> >>> >> Basically documenting what was already discussed here.
>>> >>> >>> >> Comments
>>> >>> >>> >> will
>>> >>> >>> >> be awesome!
>>> >>> >>> >>
>>> >>> >>> >> Gwen
>>> >>> >>> >>
>>> >>> >>> >> On Thu, Jan 15, 2015 at 5:19 PM, Gwen Shapira
>>> >>> >>> >> <[email protected]>
>>> >>> >>> >> wrote:
>>> >>> >>> >> > The errors are part of the KIP process now, so I think the
>>> >>> >>> >> > clients are
>>> >>> >>> >> safe :)
>>> >>> >>> >> >
>>> >>> >>> >> >
>>> >>> >>> >>
>>> >>> >>>
>>> >>> >>>
>>> >>> >>> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals
>>> >>> >>> >> >
>>> >>> >>> >> > On Thu, Jan 15, 2015 at 5:12 PM, Steve Morin
>>> >>> >>> >> > <[email protected]>
>>> >>> >>> >> wrote:
>>> >>> >>> >> >> Agree errors should be part of the protocol
>>> >>> >>> >> >>
>>> >>> >>> >> >>> On Jan 15, 2015, at 17:59, Gwen Shapira
>>> >>> >>> >> >>> <[email protected]>
>>> >>> >>> >> wrote:
>>> >>> >>> >> >>>
>>> >>> >>> >> >>> Hi,
>>> >>> >>> >> >>>
>>> >>> >>> >> >>> I got convinced by Joe and Dana that errors are indeed
>>> >>> >>> >> >>> part of
>>> >>> >>> >> >>> the
>>> >>> >>> >> >>> protocol and can't be randomly added.
>>> >>> >>> >> >>>
>>> >>> >>> >> >>> So, it looks like we need to bump version of
>>> >>> >>> >> >>> ProduceRequest in
>>> >>> >>> >> >>> the
>>> >>> >>> >> >>> following way:
>>> >>> >>> >> >>> Version 0 -> accept acks >1. I think we should keep the
>>> >>> >>> >> >>> existing
>>> >>> >>> >> >>> behavior too (i.e. not replace it with -1) to avoid
>>> >>> >>> >> >>> surprising
>>> >>> >>> >> >>> clients, but I'm willing to hear other opinions.
>>> >>> >>> >> >>> Version 1 -> do not accept acks >1 and return an error.
>>> >>> >>> >> >>> Are we ok with the error I added in KAFKA-1697? We can use
>>> >>> >>> >> >>> something
>>> >>> >>> >> >>> less specific like InvalidRequestParameter. This error can
>>> >>> >>> >> >>> be
>>> >>> >>> >> >>> reused
>>> >>> >>> >> >>> in the future and reduce the need to add errors, but will
>>> >>> >>> >> >>> also
>>> >>> >>> >> >>> be
>>> >>> >>> less
>>> >>> >>> >> >>> clear to the client and its users. Maybe even add the
>>> >>> >>> >> >>> error
>>> >>> >>> >> >>> message
>>> >>> >>> >> >>> string to the protocol in addition to the error code?
>>> >>> >>> >> >>> (since
>>> >>> >>> >> >>> we are
>>> >>> >>> >> >>> bumping versions....)
>>> >>> >>> >> >>>
>>> >>> >>> >> >>> I think maintaining the old version throughout 0.8.X makes
>>> >>> >>> >> >>> sense.
>>> >>> >>> IMO
>>> >>> >>> >> >>> dropping it for 0.9 is feasible, but I'll let client
>>> >>> >>> >> >>> owners
>>> >>> >>> >> >>> help
>>> >>> >>> make
>>> >>> >>> >> >>> that call.
>>> >>> >>> >> >>>
>>> >>> >>> >> >>> Am I missing anything? Should I start a KIP? It seems like
>>> >>> >>> >> >>> a
>>> >>> >>> KIP-type
>>> >>> >>> >> >>> discussion :)
>>> >>> >>> >> >>>
>>> >>> >>> >> >>> Gwen
>>> >>> >>> >> >>>
>>> >>> >>> >> >>>
>>> >>> >>> >> >>> On Thu, Jan 15, 2015 at 2:31 PM, Ewen Cheslack-Postava
>>> >>> >>> >> >>> <[email protected]> wrote:
>>> >>> >>> >> >>>> Gwen,
>>> >>> >>> >> >>>>
>>> >>> >>> >> >>>> I think the only option that wouldn't require a protocol
>>> >>> >>> >> >>>> version
>>> >>> >>> >> change is
>>> >>> >>> >> >>>> the one where acks > 1 is converted to acks = -1 since
>>> >>> >>> >> >>>> it's
>>> >>> >>> >> >>>> the
>>> >>> >>> only
>>> >>> >>> >> one
>>> >>> >>> >> >>>> that doesn't potentially break older clients. The
>>> >>> >>> >> >>>> protocol
>>> >>> >>> >> >>>> guide
>>> >>> >>> >> says that
>>> >>> >>> >> >>>> the expected upgrade path is servers first, then clients,
>>> >>> >>> >> >>>> so
>>> >>> >>> >> >>>> old
>>> >>> >>> >> clients,
>>> >>> >>> >> >>>> including non-java clients, that may be using acks > 1
>>> >>> >>> >> >>>> should
>>> >>> >>> >> >>>> be
>>> >>> >>> >> able to
>>> >>> >>> >> >>>> work with a new broker version.
>>> >>> >>> >> >>>>
>>> >>> >>> >> >>>> It's more work, but I think dealing with the protocol
>>> >>> >>> >> >>>> change
>>> >>> >>> >> >>>> is the
>>> >>> >>> >> right
>>> >>> >>> >> >>>> thing to do since it eventually gets us to the behavior I
>>> >>> >>> >> >>>> think is
>>> >>> >>> >> better --
>>> >>> >>> >> >>>> the broker should reject requests with invalid values. I
>>> >>> >>> >> >>>> think Joe
>>> >>> >>> >> and I
>>> >>> >>> >> >>>> were basically in agreement. In my mind the major piece
>>> >>> >>> >> >>>> missing
>>> >>> >>> from
>>> >>> >>> >> his
>>> >>> >>> >> >>>> description is how long we're going to maintain his "case
>>> >>> >>> >> >>>> 0"
>>> >>> >>> >> behavior. It's
>>> >>> >>> >> >>>> impractical to maintain old versions forever, but it
>>> >>> >>> >> >>>> sounds
>>> >>> >>> >> >>>> like
>>> >>> >>> >> there
>>> >>> >>> >> >>>> hasn't been a decision on how long to maintain them.
>>> >>> >>> >> >>>> Maybe
>>> >>> >>> >> >>>> that's
>>> >>> >>> >> another
>>> >>> >>> >> >>>> item to add to KIPs -- protocol versions and behavior
>>> >>> >>> >> >>>> need to
>>> >>> >>> >> >>>> be
>>> >>> >>> >> listed as
>>> >>> >>> >> >>>> deprecated and the earliest version in which they'll be
>>> >>> >>> >> >>>> removed
>>> >>> >>> >> should be
>>> >>> >>> >> >>>> specified so users can understand which versions are
>>> >>> >>> >> >>>> guaranteed to
>>> >>> >>> be
>>> >>> >>> >> >>>> compatible, even if they're using (well-written) non-java
>>> >>> >>> >> >>>> clients.
>>> >>> >>> >> >>>>
>>> >>> >>> >> >>>> -Ewen
>>> >>> >>> >> >>>>
>>> >>> >>> >> >>>>
>>> >>> >>> >> >>>>> On Thu, Jan 15, 2015 at 12:52 PM, Dana Powers <
>>> >>> >>> >> [email protected]> wrote:
>>> >>> >>> >> >>>>>
>>> >>> >>> >> >>>>>> clients don't break on unknown errors
>>> >>> >>> >> >>>>>
>>> >>> >>> >> >>>>> maybe true for the official java clients, but I dont
>>> >>> >>> >> >>>>> think
>>> >>> >>> >> >>>>> the
>>> >>> >>> >> assumption
>>> >>> >>> >> >>>>> holds true for community-maintained clients and users of
>>> >>> >>> >> >>>>> those
>>> >>> >>> >> clients.
>>> >>> >>> >> >>>>> kafka-python generally follows the fail-fast philosophy
>>> >>> >>> >> >>>>> and
>>> >>> >>> >> >>>>> raises
>>> >>> >>> >> an
>>> >>> >>> >> >>>>> exception on any unrecognized error code in any server
>>> >>> >>> >> >>>>> response.
>>> >>> >>> >> in this
>>> >>> >>> >> >>>>> case, kafka-python allows users to set their own
>>> >>> >>> >> >>>>> required-acks
>>> >>> >>> >> policy when
>>> >>> >>> >> >>>>> creating a producer instance.  It is possible that users
>>> >>> >>> >> >>>>> of
>>> >>> >>> >> kafka-python
>>> >>> >>> >> >>>>> have deployed producer code that uses ack>1 -- perhaps
>>> >>> >>> >> >>>>> in
>>> >>> >>> production
>>> >>> >>> >> >>>>> environments -- and for those users the new error code
>>> >>> >>> >> >>>>> will
>>> >>> >>> >> >>>>> crash
>>> >>> >>> >> their
>>> >>> >>> >> >>>>> producer code.  I would not be surprised if the same
>>> >>> >>> >> >>>>> were
>>> >>> >>> >> >>>>> true of
>>> >>> >>> >> other
>>> >>> >>> >> >>>>> community clients.
>>> >>> >>> >> >>>>>
>>> >>> >>> >> >>>>> *one reason for the fail-fast approach is that there
>>> >>> >>> >> >>>>> isn't
>>> >>> >>> >> >>>>> great
>>> >>> >>> >> >>>>> documentation on what errors to expect for each request
>>> >>> >>> >> >>>>> /
>>> >>> >>> >> >>>>> response
>>> >>> >>> >> -- so
>>> >>> >>> >> >>>>> we
>>> >>> >>> >> >>>>> use failures to alert that some error case is not
>>> >>> >>> >> >>>>> handled
>>> >>> >>> >> properly.  and
>>> >>> >>> >> >>>>> because of that, introducing new error cases without
>>> >>> >>> >> >>>>> bumping
>>> >>> >>> >> >>>>> the
>>> >>> >>> api
>>> >>> >>> >> >>>>> version is likely to cause those errors to get
>>> >>> >>> >> >>>>> raised/thrown
>>> >>> >>> >> >>>>> all
>>> >>> >>> >> the way
>>> >>> >>> >> >>>>> back up to the user.  of course we (client maintainers)
>>> >>> >>> >> >>>>> can
>>> >>> >>> >> >>>>> fix
>>> >>> >>> the
>>> >>> >>> >> issues
>>> >>> >>> >> >>>>> in the client libraries and suggest users upgrade, but
>>> >>> >>> >> >>>>> it's
>>> >>> >>> >> >>>>> not
>>> >>> >>> the
>>> >>> >>> >> ideal
>>> >>> >>> >> >>>>> situation.
>>> >>> >>> >> >>>>>
>>> >>> >>> >> >>>>>
>>> >>> >>> >> >>>>> long-winded way of saying: I agree w/ Joe.
>>> >>> >>> >> >>>>>
>>> >>> >>> >> >>>>> -Dana
>>> >>> >>> >> >>>>>
>>> >>> >>> >> >>>>>
>>> >>> >>> >> >>>>> On Thu, Jan 15, 2015 at 12:07 PM, Gwen Shapira <
>>> >>> >>> >> [email protected]>
>>> >>> >>> >> >>>>> wrote:
>>> >>> >>> >> >>>>>
>>> >>> >>> >> >>>>>> Is the protocol bump caused by the behavior change or
>>> >>> >>> >> >>>>>> the
>>> >>> >>> >> >>>>>> new
>>> >>> >>> error
>>> >>> >>> >> >>>>>> code?
>>> >>> >>> >> >>>>>>
>>> >>> >>> >> >>>>>> 1) IMO, error_codes are data, and clients can expect to
>>> >>> >>> >> >>>>>> receive
>>> >>> >>> >> errors
>>> >>> >>> >> >>>>>> that they don't understand (i.e. unknown errors).
>>> >>> >>> >> >>>>>> AFAIK,
>>> >>> >>> >> >>>>>> clients
>>> >>> >>> >> don't
>>> >>> >>> >> >>>>>> break on unknown errors, they are simple more
>>> >>> >>> >> >>>>>> challenging
>>> >>> >>> >> >>>>>> to
>>> >>> >>> >> debug. If
>>> >>> >>> >> >>>>>> we document the new behavior, then its definitely
>>> >>> >>> >> >>>>>> debuggable and
>>> >>> >>> >> >>>>>> fixable.
>>> >>> >>> >> >>>>>>
>>> >>> >>> >> >>>>>> 2) The behavior change is basically a deprecation -
>>> >>> >>> >> >>>>>> i.e.
>>> >>> >>> >> >>>>>> acks > 1
>>> >>> >>> >> were
>>> >>> >>> >> >>>>>> never documented, and are not supported by Kafka
>>> >>> >>> >> >>>>>> clients
>>> >>> >>> >> >>>>>> starting
>>> >>> >>> >> with
>>> >>> >>> >> >>>>>> version 0.8.2. I'm not sure this requires a protocol
>>> >>> >>> >> >>>>>> bump
>>> >>> >>> >> >>>>>> either,
>>> >>> >>> >> >>>>>> although its a better case than new error codes.
>>> >>> >>> >> >>>>>>
>>> >>> >>> >> >>>>>> Thanks,
>>> >>> >>> >> >>>>>> Gwen
>>> >>> >>> >> >>>>>>
>>> >>> >>> >> >>>>>> On Thu, Jan 15, 2015 at 10:10 AM, Joe Stein <
>>> >>> >>> [email protected]>
>>> >>> >>> >> >>>>>> wrote:
>>> >>> >>> >> >>>>>>> Looping in the mailing list that the client developers
>>> >>> >>> >> >>>>>>> live on
>>> >>> >>> >> because
>>> >>> >>> >> >>>>>> they
>>> >>> >>> >> >>>>>>> are all not on dev (though they should be if they want
>>> >>> >>> >> >>>>>>> to
>>> >>> >>> >> >>>>>>> be
>>> >>> >>> >> helping
>>> >>> >>> >> >>>>>>> to
>>> >>> >>> >> >>>>>>> build the best client libraries they can).
>>> >>> >>> >> >>>>>>>
>>> >>> >>> >> >>>>>>> I whole hardily believe that we need to not break
>>> >>> >>> >> >>>>>>> existing
>>> >>> >>> >> >>>>>>> functionality
>>> >>> >>> >> >>>>>> of
>>> >>> >>> >> >>>>>>> the client protocol, ever.
>>> >>> >>> >> >>>>>>>
>>> >>> >>> >> >>>>>>> There are many reasons for this and we have other
>>> >>> >>> >> >>>>>>> threads
>>> >>> >>> >> >>>>>>> on the
>>> >>> >>> >> >>>>>>> mailing
>>> >>> >>> >> >>>>>>> list where we are discussing that topic (no pun
>>> >>> >>> >> >>>>>>> intended)
>>> >>> >>> >> >>>>>>> that I
>>> >>> >>> >> don't
>>> >>> >>> >> >>>>>> want
>>> >>> >>> >> >>>>>>> to re-hash here.
>>> >>> >>> >> >>>>>>>
>>> >>> >>> >> >>>>>>> If we change wire protocol functionality OR the binary
>>> >>> >>> >> >>>>>>> format
>>> >>> >>> >> (either)
>>> >>> >>> >> >>>>>>> we
>>> >>> >>> >> >>>>>>> must bump version AND treat version as a feature flag
>>> >>> >>> >> >>>>>>> with
>>> >>> >>> >> backward
>>> >>> >>> >> >>>>>>> compatibility support until it is deprecated for some
>>> >>> >>> >> >>>>>>> time
>>> >>> >>> >> >>>>>>> for
>>> >>> >>> >> folks
>>> >>> >>> >> >>>>>>> to
>>> >>> >>> >> >>>>>> deal
>>> >>> >>> >> >>>>>>> with it.
>>> >>> >>> >> >>>>>>>
>>> >>> >>> >> >>>>>>> match version = {
>>> >>> >>> >> >>>>>>> case 0: keepDoingWhatWeWereDoing()
>>> >>> >>> >> >>>>>>> case 1: doNewStuff()
>>> >>> >>> >> >>>>>>> case 2: doEvenMoreNewStuff()
>>> >>> >>> >> >>>>>>> }
>>> >>> >>> >> >>>>>>>
>>> >>> >>> >> >>>>>>> has to be a practice we adopt imho ... I know feature
>>> >>> >>> >> >>>>>>> flags can
>>> >>> >>> be
>>> >>> >>> >> >>>>>> construed
>>> >>> >>> >> >>>>>>> as "messy code" but I am eager to hear another
>>> >>> >>> >> >>>>>>> (better?
>>> >>> >>> >> different?)
>>> >>> >>> >> >>>>>> solution
>>> >>> >>> >> >>>>>>> to this.
>>> >>> >>> >> >>>>>>>
>>> >>> >>> >> >>>>>>> If we don't do a feature flag like this specifically
>>> >>> >>> >> >>>>>>> with
>>> >>> >>> >> >>>>>>> this
>>> >>> >>> >> change
>>> >>> >>> >> >>>>>> then
>>> >>> >>> >> >>>>>>> what happens is that someone upgrades their brokers
>>> >>> >>> >> >>>>>>> with a
>>> >>> >>> rolling
>>> >>> >>> >> >>>>>> restart
>>> >>> >>> >> >>>>>>> in 0.8.3 and every single one of their producer
>>> >>> >>> >> >>>>>>> requests
>>> >>> >>> >> >>>>>>> start
>>> >>> >>> to
>>> >>> >>> >> fail
>>> >>> >>> >> >>>>>> and
>>> >>> >>> >> >>>>>>> they have a major production outage. eeeek!!!!
>>> >>> >>> >> >>>>>>>
>>> >>> >>> >> >>>>>>> I do 100% agree that > 1 makes no sense and we
>>> >>> >>> >> >>>>>>> *REALLY*
>>> >>> >>> >> >>>>>>> need
>>> >>> >>> >> people to
>>> >>> >>> >> >>>>>> start
>>> >>> >>> >> >>>>>>> using 0,1,-1 but we need to-do that in a way that is
>>> >>> >>> >> >>>>>>> going
>>> >>> >>> >> >>>>>>> to
>>> >>> >>> >> work for
>>> >>> >>> >> >>>>>>> everyone.
>>> >>> >>> >> >>>>>>>
>>> >>> >>> >> >>>>>>> Old producers and consumers must keep working with new
>>> >>> >>> >> >>>>>>> brokers
>>> >>> >>> >> and if
>>> >>> >>> >> >>>>>>> we
>>> >>> >>> >> >>>>>> are
>>> >>> >>> >> >>>>>>> not going to support that then I am unclear what the
>>> >>> >>> >> >>>>>>> use
>>> >>> >>> >> >>>>>>> of
>>> >>> >>> >> "version"
>>> >>> >>> >> >>>>>>> is
>>> >>> >>> >> >>>>>>> based on our original intentions of having it because
>>> >>> >>> >> >>>>>>> of
>>> >>> >>> >> >>>>>>> the
>>> >>> >>> >> >>>>>>> 0.7=>-0.8.
>>> >>> >>> >> >>>>>> We
>>> >>> >>> >> >>>>>>> said no more breaking changes when we did that.
>>> >>> >>> >> >>>>>>>
>>> >>> >>> >> >>>>>>> - Joe Stein
>>> >>> >>> >> >>>>>>>
>>> >>> >>> >> >>>>>>> On Thu, Jan 15, 2015 at 12:38 PM, Ewen
>>> >>> >>> >> >>>>>>> Cheslack-Postava <
>>> >>> >>> >> >>>>>> [email protected]>
>>> >>> >>> >> >>>>>>> wrote:
>>> >>> >>> >> >>>>>>>>
>>> >>> >>> >> >>>>>>>> Right, so this looks like it could create an issue
>>> >>> >>> >> >>>>>>>> similar to
>>> >>> >>> >> what's
>>> >>> >>> >> >>>>>>>> currently being discussed in
>>> >>> >>> >> >>>>>>>> https://issues.apache.org/jira/browse/KAFKA-1649
>>> >>> >>> >> >>>>>>>> where
>>> >>> >>> >> >>>>>>>> users
>>> >>> >>> >> now get
>>> >>> >>> >> >>>>>>>> errors
>>> >>> >>> >> >>>>>>>> under conditions when they previously wouldn't. Old
>>> >>> >>> >> >>>>>>>> clients
>>> >>> >>> won't
>>> >>> >>> >> >>>>>>>> even
>>> >>> >>> >> >>>>>>>> know
>>> >>> >>> >> >>>>>>>> about the error code, so besides failing they won't
>>> >>> >>> >> >>>>>>>> even
>>> >>> >>> >> >>>>>>>> be
>>> >>> >>> able
>>> >>> >>> >> to
>>> >>> >>> >> >>>>>>>> log
>>> >>> >>> >> >>>>>>>> any
>>> >>> >>> >> >>>>>>>> meaningful error messages.
>>> >>> >>> >> >>>>>>>>
>>> >>> >>> >> >>>>>>>> I think there are two options for compatibility:
>>> >>> >>> >> >>>>>>>>
>>> >>> >>> >> >>>>>>>> 1. An alternative change is to remove the ack > 1
>>> >>> >>> >> >>>>>>>> code,
>>> >>> >>> >> >>>>>>>> but
>>> >>> >>> >> silently
>>> >>> >>> >> >>>>>>>> "upgrade" requests with acks > 1 to acks = -1. This
>>> >>> >>> >> >>>>>>>> isn't
>>> >>> >>> >> >>>>>>>> the
>>> >>> >>> >> same as
>>> >>> >>> >> >>>>>>>> other
>>> >>> >>> >> >>>>>>>> changes to behavior since the interaction between the
>>> >>> >>> >> >>>>>>>> client
>>> >>> >>> and
>>> >>> >>> >> >>>>>>>> server
>>> >>> >>> >> >>>>>>>> remains the same, no error codes change, etc. The
>>> >>> >>> >> >>>>>>>> client
>>> >>> >>> >> >>>>>>>> might
>>> >>> >>> >> just
>>> >>> >>> >> >>>>>>>> see
>>> >>> >>> >> >>>>>>>> some increased latency since the message might need
>>> >>> >>> >> >>>>>>>> to be
>>> >>> >>> >> replicated
>>> >>> >>> >> >>>>>>>> to
>>> >>> >>> >> >>>>>>>> more brokers than they requested.
>>> >>> >>> >> >>>>>>>> 2. Split this into two patches, one that bumps the
>>> >>> >>> >> >>>>>>>> protocol
>>> >>> >>> >> version
>>> >>> >>> >> >>>>>>>> on
>>> >>> >>> >> >>>>>>>> that
>>> >>> >>> >> >>>>>>>> message to include the new error code but maintains
>>> >>> >>> >> >>>>>>>> both
>>> >>> >>> >> >>>>>>>> old
>>> >>> >>> (now
>>> >>> >>> >> >>>>>>>> deprecated) and new behavior, then a second that
>>> >>> >>> >> >>>>>>>> would be
>>> >>> >>> >> applied in
>>> >>> >>> >> >>>>>>>> a
>>> >>> >>> >> >>>>>>>> later release that removes the old protocol + code
>>> >>> >>> >> >>>>>>>> for
>>> >>> >>> >> >>>>>>>> handling
>>> >>> >>> >> acks
>>> >>> >>> >> >>>>>> 1.
>>> >>> >>> >> >>>>>>>>
>>> >>> >>> >> >>>>>>>> 2 is probably the right thing to do. If we specify
>>> >>> >>> >> >>>>>>>> the
>>> >>> >>> >> >>>>>>>> release
>>> >>> >>> >> when
>>> >>> >>> >> >>>>>> we'll
>>> >>> >>> >> >>>>>>>> remove the deprecated protocol at the time of
>>> >>> >>> >> >>>>>>>> deprecation
>>> >>> >>> >> >>>>>>>> it
>>> >>> >>> >> makes
>>> >>> >>> >> >>>>>> things
>>> >>> >>> >> >>>>>>>> a
>>> >>> >>> >> >>>>>>>> lot easier for people writing non-java clients and
>>> >>> >>> >> >>>>>>>> could
>>> >>> >>> >> >>>>>>>> give
>>> >>> >>> >> users
>>> >>> >>> >> >>>>>> better
>>> >>> >>> >> >>>>>>>> predictability (e.g. if clients are at most 1 major
>>> >>> >>> >> >>>>>>>> release
>>> >>> >>> >> behind
>>> >>> >>> >> >>>>>>>> brokers,
>>> >>> >>> >> >>>>>>>> they'll remain compatible but possibly use deprecated
>>> >>> >>> features).
>>> >>> >>> >> >>>>>>>>
>>> >>> >>> >> >>>>>>>>
>>> >>> >>> >> >>>>>>>> On Wed, Jan 14, 2015 at 3:51 PM, Gwen Shapira <
>>> >>> >>> >> [email protected]>
>>> >>> >>> >> >>>>>>>> wrote:
>>> >>> >>> >> >>>>>>>>
>>> >>> >>> >> >>>>>>>>> Hi Kafka Devs,
>>> >>> >>> >> >>>>>>>>>
>>> >>> >>> >> >>>>>>>>> We are working on KAFKA-1697 - remove code related
>>> >>> >>> >> >>>>>>>>> to
>>> >>> >>> >> >>>>>>>>> ack>1 on
>>> >>> >>> >> the
>>> >>> >>> >> >>>>>>>>> broker. Per Neha's suggestion, I'd like to give
>>> >>> >>> >> >>>>>>>>> everyone
>>> >>> >>> >> >>>>>>>>> a
>>> >>> >>> >> heads up
>>> >>> >>> >> >>>>>>>>> on
>>> >>> >>> >> >>>>>>>>> what these changes mean.
>>> >>> >>> >> >>>>>>>>>
>>> >>> >>> >> >>>>>>>>> Once this patch is included, any produce requests
>>> >>> >>> >> >>>>>>>>> that
>>> >>> >>> >> >>>>>>>>> include
>>> >>> >>> >> >>>>>>>>> request.required.acks > 1 will result in an
>>> >>> >>> >> >>>>>>>>> exception.
>>> >>> >>> >> >>>>>>>>> This
>>> >>> >>> >> will be
>>> >>> >>> >> >>>>>>>>> InvalidRequiredAcks in new versions (0.8.3 and up, I
>>> >>> >>> >> >>>>>>>>> assume)
>>> >>> >>> and
>>> >>> >>> >> >>>>>>>>> UnknownException in existing versions (sorry, but I
>>> >>> >>> >> >>>>>>>>> can't add
>>> >>> >>> >> error
>>> >>> >>> >> >>>>>>>>> codes retroactively).
>>> >>> >>> >> >>>>>>>>>
>>> >>> >>> >> >>>>>>>>> This behavior is already enforced by 0.8.2 producers
>>> >>> >>> >> >>>>>>>>> (sync and
>>> >>> >>> >> >>>>>>>>> new),
>>> >>> >>> >> >>>>>>>>> but we expect impact on users with older producers
>>> >>> >>> >> >>>>>>>>> that
>>> >>> >>> >> >>>>>>>>> relied
>>> >>> >>> >> on
>>> >>> >>> >> >>>>>>>>> acks
>>> >>> >>> >> >>>>>>>>>> 1 and external clients (i.e python, go, etc).
>>> >>> >>> >> >>>>>>>>>
>>> >>> >>> >> >>>>>>>>> Users who relied on acks > 1 are expected to switch
>>> >>> >>> >> >>>>>>>>> to
>>> >>> >>> >> >>>>>>>>> using
>>> >>> >>> >> acks =
>>> >>> >>> >> >>>>>>>>> -1
>>> >>> >>> >> >>>>>>>>> and a min.isr parameter than matches their user
>>> >>> >>> >> >>>>>>>>> case.
>>> >>> >>> >> >>>>>>>>>
>>> >>> >>> >> >>>>>>>>> This change was discussed in the past in the context
>>> >>> >>> >> >>>>>>>>> of
>>> >>> >>> >> KAFKA-1555
>>> >>> >>> >> >>>>>>>>> (min.isr), but let us know if you have any questions
>>> >>> >>> >> >>>>>>>>> or
>>> >>> >>> concerns
>>> >>> >>> >> >>>>>>>>> regarding this change.
>>> >>> >>> >> >>>>>>>>>
>>> >>> >>> >> >>>>>>>>> Gwen
>>> >>> >>> >> >>>>>>>>
>>> >>> >>> >> >>>>>>>>
>>> >>> >>> >> >>>>>>>>
>>> >>> >>> >> >>>>>>>> --
>>> >>> >>> >> >>>>>>>> Thanks,
>>> >>> >>> >> >>>>>>>> Ewen
>>> >>> >>> >> >>>>>>>
>>> >>> >>> >> >>>>>>>
>>> >>> >>> >> >>>>>>> --
>>> >>> >>> >> >>>>>>> You received this message because you are subscribed
>>> >>> >>> >> >>>>>>> to
>>> >>> >>> >> >>>>>>> the
>>> >>> >>> Google
>>> >>> >>> >> >>>>>>> Groups
>>> >>> >>> >> >>>>>>> "kafka-clients" group.
>>> >>> >>> >> >>>>>>> To unsubscribe from this group and stop receiving
>>> >>> >>> >> >>>>>>> emails
>>> >>> >>> >> >>>>>>> from
>>> >>> >>> it,
>>> >>> >>> >> send
>>> >>> >>> >> >>>>>>> an
>>> >>> >>> >> >>>>>>> email to [email protected].
>>> >>> >>> >> >>>>>>> To post to this group, send email to
>>> >>> >>> >> [email protected].
>>> >>> >>> >> >>>>>>> Visit this group at
>>> >>> >>> http://groups.google.com/group/kafka-clients.
>>> >>> >>> >> >>>>>>> To view this discussion on the web visit
>>> >>> >>> >> >>>>>>
>>> >>> >>> >> >>>>>>
>>> >>> >>> >>
>>> >>> >>>
>>> >>> >>>
>>> >>> >>> https://groups.google.com/d/msgid/kafka-clients/CAA7ooCBtH2JjyQsArdx_%3DV25B4O1QJk0YvOu9U6kYt9sB4aqng%40mail.gmail.com
>>> >>> >>> >> >>>>>> .
>>> >>> >>> >> >>>>>>>
>>> >>> >>> >> >>>>>>> For more options, visit
>>> >>> >>> >> >>>>>>> https://groups.google.com/d/optout.
>>> >>> >>> >> >>>>>>
>>> >>> >>> >> >>>>>> --
>>> >>> >>> >> >>>>>> You received this message because you are subscribed to
>>> >>> >>> >> >>>>>> the
>>> >>> >>> Google
>>> >>> >>> >> >>>>>> Groups
>>> >>> >>> >> >>>>>> "kafka-clients" group.
>>> >>> >>> >> >>>>>> To unsubscribe from this group and stop receiving
>>> >>> >>> >> >>>>>> emails
>>> >>> >>> >> >>>>>> from it,
>>> >>> >>> >> send
>>> >>> >>> >> >>>>>> an
>>> >>> >>> >> >>>>>> email to [email protected].
>>> >>> >>> >> >>>>>> To post to this group, send email to
>>> >>> >>> >> [email protected].
>>> >>> >>> >> >>>>>> Visit this group at
>>> >>> >>> >> >>>>>> http://groups.google.com/group/kafka-clients
>>> >>> >>> .
>>> >>> >>> >> >>>>>> To view this discussion on the web visit
>>> >>> >>> >> >>>>>>
>>> >>> >>> >> >>>>>>
>>> >>> >>> >>
>>> >>> >>>
>>> >>> >>>
>>> >>> >>> https://groups.google.com/d/msgid/kafka-clients/CAHBV8WeUebxi%2B%2BSbjz8E9Yf4u4hkcPJ80Xsj0XTKcTac%3D%2B613A%40mail.gmail.com
>>> >>> >>> >> >>>>>> .
>>> >>> >>> >> >>>>>> For more options, visit
>>> >>> >>> >> >>>>>> https://groups.google.com/d/optout.
>>> >>> >>> >> >>>>
>>> >>> >>> >> >>>>
>>> >>> >>> >> >>>>
>>> >>> >>> >> >>>>
>>> >>> >>> >> >>>> --
>>> >>> >>> >> >>>> Thanks,
>>> >>> >>> >> >>>> Ewen
>>> >>> >>> >>
>>> >>> >>> >
>>> >>> >>> >
>>> >>> >>>
>>> >>> >>>
>>> >>> >>> --
>>> >>> >>> Thanks,
>>> >>> >>> Ewen
>>> >>> >>>
>>> >>> >>
>>> >>> >>
>>> >>> >>
>>> >>> >> --
>>> >>> >> Thanks,
>>> >>> >> Neha
>>> >>
>>> >>
>>> >> --
>>> >> You received this message because you are subscribed to the Google
>>> >> Groups
>>> >> "kafka-clients" group.
>>> >> To unsubscribe from this group and stop receiving emails from it, send
>>> >> an
>>> >> email to [email protected].
>>> >> To post to this group, send email to [email protected].
>>> >> Visit this group at http://groups.google.com/group/kafka-clients.
>>> >> To view this discussion on the web visit
>>> >>
>>> >> https://groups.google.com/d/msgid/kafka-clients/CAOeJiJh17CYq%3D-qgPu9rnArsPW%3D7RL9AAW_h%3DrrXx0%2BKhhKgNQ%40mail.gmail.com.
>>> >>
>>> >> For more options, visit https://groups.google.com/d/optout.
>>> >
>>> >
>>
>>
>

Re: [kafka-clients] Re: Heads up: KAFKA-1697 - remove code related to ack>1 on the broker

Reply via email to