Hi Matthias,
Thanks for the valid points you mentioned. I updated the KIP and the PR
with:
1) mentioning that the new overloaded `send` throws `IllegalStateException`
if the user tries to ignore `send()` errors outside of a transaction.
2) the default implementation in `Producer` interface throws an
`UnsupportedOperationException`

Hi Chris,
Thanks for the feedback. I tried to clarify the points you listed:
-------> we've narrowed the scope from any error that might take place with
producing a record to Kafka, to only the ones that are thrown directly from
Producer::send;

>From the very beginning and even since KIP-1038, the main purpose was to
have "more flexibility," or, in other words, "giving the user the
authority" to handle some specific exceptions thrown from the `Producer`.
Due to the specific cases we had in mind, KIP-1038 was discarded and we
decided to not define a `CustomExceptionHandler` for `Producer` and instead
treat the `send` failures in a different way. The main issue is that `send`
makes a transition to error state, which is undoable. In fact, one single
poison pill record makes the whole batch fail. The former suggestions that
you agreed with have been all about un-doing this transition in `flush` or
`commit`. The new suggestion is to un-do (or better, NOT do) in `send` due
to the reasons listed in the discussions above.
Moreover, I would say that having such a large scope as you mentioned is
impossible. In the best case, we may have control over the `Producer`. What
shall we do with the broker? The `any error that might take place with
producing a record to Kafka` is too much, I think.

-------> is this all we want to handle, and will it prevent us from
handling more in the future in an intuitive way?

I think yes. This is all we want. Other sorts of errors such as having
problem with partition addition, producer fenced exception, etc seem to be
more serious issues. The intention was to handle problems created by
(maybe) a single poison pill record. BTW, I do not see any obstacles to
future changes.

Bests,
Alieh

On Sat, Jun 29, 2024 at 3:03 AM Chris Egerton <chr...@aiven.io.invalid>
wrote:

> Ah, sorry--spoke too soon. The PR doesn't show that errors thrown from
> Producer::send are handled, but instead, ApiException instances that are
> caught inside KafkaProducer::doSend and are handled by returning an
> already-failed future are. I think the same question still applies (is this
> all we want to handle, and will it prevent us from handling more in the
> future in an intuitive way), though.
>
> On Fri, Jun 28, 2024 at 8:57 PM Chris Egerton <chr...@aiven.io> wrote:
>
> > Hi Alieh,
> >
> > This KIP has evolved a lot since I last looked at it, but the changes
> seem
> > well thought-out both in semantics and API. One clarifying question I
> have
> > is that it looks based on the draft PR that we've narrowed the scope from
> > any error that might take place with producing a record to Kafka, to only
> > the ones that are thrown directly from Producer::send; is that the
> intended
> > behavior here? And if so, do you have thoughts on how we might design a
> > follow-up KIP that would catch all errors (including ones reported
> > asynchronously instead of synchronously)? I'd like it if we could leave
> the
> > door open for that without painting ourselves into too much of a corner
> > with the API design for this KIP.
> >
> > Cheers,
> >
> > Chris
> >
> > On Fri, Jun 28, 2024 at 6:31 PM Matthias J. Sax <mj...@apache.org>
> wrote:
> >
> >> Thanks Alieh,
> >>
> >> it seems this KIP can just pick between a couple of tradeoffs. Adding an
> >> overloaded `send()` as the KIP propose makes sense to me and seems to
> >> provides the cleanest solution compare to there options we discussed.
> >>
> >> Given the explicit name of the passed-in option that highlights that the
> >> option is for TX only make is pretty clear and avoids the issue of
> >> `flush()` ambiguity.
> >>
> >>
> >> Nit: We should make clear on the KIP though, that the new `send()`
> >> overload would throw an `IllegalStateException` if TX are not used
> >> (similar to other TX methods like initTx(), etc)
> >>
> >>
> >> About the `Producer` interface, I am not sure how this was done in the
> >> past (eg, KIP-266 added `Consumer.poll(Duration)` w/o a default
> >> implementation), if we need a default implementation for backward
> >> compatibility or not? If we do want to add one, I think it would be
> >> appropriate to throw an `UnsupportedOperationException` by default,
> >> instead of just keeping the default impl empty?
> >>
> >>
> >> My points are rather minor, and should not block this KIP though.
> >> Overall LGTM.
> >>
> >>
> >>
> >> -Matthias
> >>
> >> On 6/27/24 1:28 PM, Alieh Saeedi wrote:
> >> > Hi Justine,
> >> >
> >> > Thanks for the suggestion.
> >> > Making applications to validate every single record is not the best
> way,
> >> > from an efficiency point of view.
> >> > Moreover, between changing the behavior of the Producer in `send` and
> >> > `commitTnx`, the former seems more reasonable and clean.
> >> >
> >> > Bests,
> >> > Alieh
> >> >
> >> > On Thu, Jun 27, 2024 at 8:14 PM Justine Olshan
> >> <jols...@confluent.io.invalid>
> >> > wrote:
> >> >
> >> >> Hey Alieh,
> >> >>
> >> >> I see there are two options now. So folks will be discussing the
> >> approaches
> >> >> and deciding the best way forward before we vote?
> >> >> I do think there could be a problem with the approach on commit if we
> >> get
> >> >> stuck on an earlier error and have more records (potentially on new
> >> >> partitions) to commit as the current PR is implemented.
> >> >>
> >> >> I guess this takes us back to the question of whether the error
> should
> >> be
> >> >> cleared on send.
> >> >>
> >> >> (And I guess at the back of my mind, I'm wondering if there is a way
> >> we can
> >> >> validate the "posion pill" records application side before we even
> try
> >> to
> >> >> send them)
> >> >>
> >> >> Justine
> >> >>
> >> >> On Wed, Jun 26, 2024 at 4:38 PM Alieh Saeedi
> >> <asae...@confluent.io.invalid
> >> >>>
> >> >> wrote:
> >> >>
> >> >>> Hi Justine,
> >> >>>
> >> >>> I did not update the KIP with `TxnSendOption` since I thought it'd
> be
> >> >>> better discussed here beforehand.
> >> >>> right now, there are 2 PRs:
> >> >>> - the PR that implements the current version of the KIP:
> >> >>> https://github.com/apache/kafka/pull/16332
> >> >>> - the POC PR that clarifies the `TxnSendOption`:
> >> >>> https://github.com/apache/kafka/pull/16465
> >> >>>
> >> >>> Bests,
> >> >>> Alieh
> >> >>>
> >> >>> On Thu, Jun 27, 2024 at 12:42 AM Justine Olshan
> >> >>> <jols...@confluent.io.invalid> wrote:
> >> >>>
> >> >>>> Hey Alieh,
> >> >>>>
> >> >>>> I think I am a little confused. Are the 3 points above addressed by
> >> the
> >> >>> KIP
> >> >>>> or did something change? The PR seems to not include this change
> and
> >> >>> still
> >> >>>> has the CommitOption as well.
> >> >>>>
> >> >>>> Thanks,
> >> >>>> Justine
> >> >>>>
> >> >>>> On Wed, Jun 26, 2024 at 2:15 PM Alieh Saeedi
> >> >>> <asae...@confluent.io.invalid
> >> >>>>>
> >> >>>> wrote:
> >> >>>>
> >> >>>>> Hi all,
> >> >>>>>
> >> >>>>>
> >> >>>>> Looking at the PR <https://github.com/apache/kafka/pull/16332>
> >> >>>>> corresponding to the KIP, there are some points worthy of mention:
> >> >>>>>
> >> >>>>>
> >> >>>>> 1) clearing the error ends up dirty/messy code in
> >> >> `TransactionManager`.
> >> >>>>>
> >> >>>>> 2) By clearing the error, we are actually doing an illegal
> >> transition
> >> >>>> from
> >> >>>>> `ABORTABLE_ERROR` to `IN_TRANSACTION` which is conceptually not
> >> >>>> acceptable.
> >> >>>>> This can be the root cause of some issues, with perhaps further
> >> >> future
> >> >>>>> changes by others.
> >> >>>>>
> >> >>>>> 3) If the poison pill record `r1` causes a transition to the error
> >> >>> state
> >> >>>>> and then the next record `r2` requires adding a partition to the
> >> >>>>> transaction, the action fails due to being in the error state. In
> >> >> this
> >> >>>>> case, clearing errors during `commitTnx(CLEAR_SEND_ERROR)` is too
> >> >> late.
> >> >>>>> However, this case can NOT be the main concern as soon as KIP-890
> is
> >> >>>> fully
> >> >>>>> implemented.
> >> >>>>>
> >> >>>>>
> >> >>>>> My suggestion is to solve the problem where it arises. If the
> >> >>> transition
> >> >>>> to
> >> >>>>> the error state does not happen during `send()`, we do not need to
> >> >>> clear
> >> >>>>> the error later. Therefore, instead of `CommitOption`, we can
> define
> >> >> a
> >> >>>>> `TxnSendOption` and prevent the `send()` method from going to the
> >> >> error
> >> >>>>> state in case 1) we're in a transaction and 2) the user asked for
> >> >>>>> IGONRE_SEND_ERRORS. For more clarity, you can take a look at the
> POC
> >> >> PR
> >> >>>>> <https://github.com/apache/kafka/pull/16465>.
> >> >>>>>
> >> >>>>> Cheers,
> >> >>>>> Alieh
> >> >>>>>
> >> >>>>
> >> >>>
> >> >>
> >> >
> >>
> >
>

Reply via email to