Hello, Could we restart the discussion here again?
My last message was sent on the 3rd of June but I haven't received replies since then. I'd like to get this KIP to a finished state, regardless of whether that is merged-in or discarded. It has been almost one year since the publication of the KIP. Thanks, Stanislav On Mon, Jun 3, 2019 at 11:19 AM Stanislav Kozlovski <stanis...@confluent.io> wrote: > Do people agree with the approach I outlined in my last reply? > > On Mon, May 6, 2019 at 2:12 PM Stanislav Kozlovski <stanis...@confluent.io> > wrote: > >> Hey there Kamal, >> >> I'm sincerely sorry for missing your earlier message. As I open this >> thread up, I see I have an unsent draft message about resuming discussion >> from some time ago. >> >> In retrospect, I think I may have been too pedantic with the exception >> naming and hierarchy. >> I now believe a single exception type of `RecordDeserializationException` >> is enough. Let's go with that. >> >> On Mon, May 6, 2019 at 6:40 AM Kamal Chandraprakash < >> kamal.chandraprak...@gmail.com> wrote: >> >>> Matthias, >>> >>> We already have CorruptRecordException which doesn't extend the >>> SerializationException. So, we need an alternate >>> name suggestion for the corrupted record error if we decide to extend the >>> FaultyRecordException class. >>> >>> Stanislav, >>> >>> Our users are also facing this error. Could we bump up this discussion? >>> >>> I think we can have a single exception type >>> FaultyRecordException/RecordDeserialization exception to capture both >>> the errors. We can add an additional enum field to differentiate the >>> errors >>> if required. >>> >>> Thanks, >>> Kamal Chandraprakash >>> >>> On Wed, Apr 24, 2019 at 1:49 PM Kamal Chandraprakash < >>> kamal.chandraprak...@gmail.com> wrote: >>> >>> > Stanislav, >>> > >>> > Any updates on this KIP? We have internal users who want to skip the >>> > corrupted message while consuming the records. >>> > >>> > >>> > On Fri, Oct 19, 2018 at 11:34 PM Matthias J. Sax < >>> matth...@confluent.io> >>> > wrote: >>> > >>> >> I am not 100% familiar with the details of the consumer code, however >>> I >>> >> tend to disagree with: >>> >> >>> >> > There's no difference between the two cases -- if (and only if) the >>> >> message is corrupt, it can't be deserialized. If (and only if) it >>> can't be >>> >> deserialized, it is corrupt. >>> >> >>> >> Assume that a user configures a JSON deserializer but a faulty >>> upstream >>> >> producer writes an Avro message. For this case, the message is not >>> >> corrupted, but still can't be deserialized. And I can imaging that >>> users >>> >> want to handle both cases differently. >>> >> >>> >> Thus, I think it makes sense to have two different exceptions >>> >> `RecordDeserializationException` and `CorruptedRecordException` that >>> can >>> >> both extend `FaultyRecordException` (don't like this name too much >>> >> honestly, but don't have a better idea for it anyway). >>> >> >>> >> Side remark. If we introduce class `RecordDeserializationException` >>> and >>> >> `CorruptedRecordException`, we can also add an interface that both >>> >> implement to return partition/offset information and let both extend >>> >> `SerializationException` directly without an intermediate class in the >>> >> exception hierarchy. >>> >> >>> >> >>> >> -Matthias >>> >> >>> >> On 8/8/18 2:57 AM, Stanislav Kozlovski wrote: >>> >> >> If you are inheriting from SerializationException, your derived >>> class >>> >> > should also be a kind of serialization exception. Not something >>> more >>> >> > general. >>> >> > Yeah, the reason for inheriting it would be for >>> backwards-compatibility. >>> >> > >>> >> >> Hmm. Can you think of any new scenarios that would make Kafka >>> force >>> >> the >>> >> > user need to skip a specific record? Perhaps one scenario is if >>> records >>> >> > are lost but we don't know how many. >>> >> > Not on the spot, but I do wonder how likely a new scenario is to >>> >> surface in >>> >> > the future and how we'd handle the exceptions' class hierarchy then. >>> >> > >>> >> >> Which offset were we planning to use in the >>> >> > exception? >>> >> > The offset of the record which caused the exception. In the case of >>> >> > batches, we use the last offset of the batch. In both cases, users >>> >> should >>> >> > have to seek +1 from the given offset. You can review the PR to >>> ensure >>> >> its >>> >> > accurate >>> >> > >>> >> > >>> >> > If both of you prefer `RecordDeserializationException`, we can go >>> with >>> >> > that. Please do confirm that is okay >>> >> > >>> >> > On Tue, Aug 7, 2018 at 11:35 PM Jason Gustafson <ja...@confluent.io >>> > >>> >> wrote: >>> >> > >>> >> >> One difference between the two cases is that we can't generally >>> trust >>> >> the >>> >> >> offset of a corrupt message. Which offset were we planning to use >>> in >>> >> the >>> >> >> exception? Maybe it should be either the fetch offset or one plus >>> the >>> >> last >>> >> >> consumed offset? I think I'm with Colin in preferring >>> >> >> RecordDeserializationException for both cases if possible. For one >>> >> thing, >>> >> >> that makes the behavior consistent whether or not `check.crs` is >>> >> enabled. >>> >> >> >>> >> >> -Jason >>> >> >> >>> >> >> On Tue, Aug 7, 2018 at 11:17 AM, Colin McCabe <cmcc...@apache.org> >>> >> wrote: >>> >> >> >>> >> >>> Hi Stanislav, >>> >> >>> >>> >> >>> On Sat, Aug 4, 2018, at 10:44, Stanislav Kozlovski wrote: >>> >> >>>> Hey Colin, >>> >> >>>> >>> >> >>>> It may be a bit vague but keep in mind we also raise the >>> exception in >>> >> >> the >>> >> >>>> case where the record is corrupted. >>> >> >>>> We discussed with Jason offline that message corruption most >>> often >>> >> >>> prevents >>> >> >>>> deserialization itself and that may be enough of an argument to >>> raise >>> >> >>>> `RecordDeserializationException` in the case of a corrupt >>> record. I >>> >> >>>> personally find that misleading. >>> >> >>> >>> >> >>> Hmm. I think that by definition, corrupt records are records that >>> >> can't >>> >> >>> be deserialized. There's no difference between the two cases -- >>> if >>> >> (and >>> >> >>> only if) the message is corrupt, it can't be deserialized. If >>> (and >>> >> only >>> >> >>> if) it can't be deserialized, it is corrupt. >>> >> >>> >>> >> >>>> >>> >> >>>> In the end, I think it might be worth it to have a bit of a >>> >> >>>> wider-encompassing `FaultyRecordException` (or even >>> >> >>>> `UnconsumableRecordException`) which would allow users to skip >>> over >>> >> the >>> >> >>>> record. >>> >> >>> >>> >> >>> If you are inheriting from SerializationException, your derived >>> class >>> >> >>> should also be a kind of serialization exception. Not something >>> more >>> >> >>> general. >>> >> >>> >>> >> >>>> We could then potentially have more specific exceptions (e.g >>> >> >>>> deserialization) inherit that but I'm not sure if we should. >>> >> >>>> A case for a more general exception which provides access to the >>> >> >>>> partition/offset is future backwards-compatibility. If there is >>> ever >>> >> a >>> >> >>> new >>> >> >>>> scenario that would make the user need to skip a specific record >>> - >>> >> >> there >>> >> >>>> would already be such an exception and this will provide some >>> >> >>>> backward-compatibility with older clients. >>> >> >>> >>> >> >>> Hmm. Can you think of any new scenarios that would make Kafka >>> force >>> >> the >>> >> >>> user need to skip a specific record? Perhaps one scenario is if >>> >> records >>> >> >>> are lost but we don't know how many. >>> >> >>> >>> >> >>> If we really want to have something more general, perhaps we need >>> >> >>> something like LostDataException. But in that case, we can't >>> inherit >>> >> >> from >>> >> >>> SerializationException, since lost data is more general than >>> >> >> serialization >>> >> >>> issues. >>> >> >>> >>> >> >>> best, >>> >> >>> Colin >>> >> >>> >>> >> >>> >>> >> >>>> >>> >> >>>> Best, >>> >> >>>> Stanislav >>> >> >>>> >>> >> >>>> On Sat, Aug 4, 2018 at 12:23 AM Colin McCabe <cmcc...@apache.org >>> > >>> >> >> wrote: >>> >> >>>> >>> >> >>>>> Hi Stanislav, >>> >> >>>>> >>> >> >>>>> Thanks for the KIP. >>> >> >>>>> >>> >> >>>>> As as user, "FaultyRecordException" seems a little vague. >>> What's >>> >> >>> faulty >>> >> >>>>> about it? Is it too big? Does it not match the schema of the >>> other >>> >> >>>>> records? Was it not found? >>> >> >>>>> >>> >> >>>>> Of course, we know that the specific problem is with >>> [de]serializing >>> >> >>> the >>> >> >>>>> record, not with any of those those things. So we should >>> choose a >>> >> >> name >>> >> >>>>> that reflects that serialization is the problem. How about >>> >> >>>>> RecordSerializationException? >>> >> >>>>> >>> >> >>>>> best, >>> >> >>>>> Colin >>> >> >>>>> >>> >> >>>>> >>> >> >>>>> On Thu, Aug 2, 2018, at 15:11, Stanislav Kozlovski wrote: >>> >> >>>>>> Hi Jason and Ted, >>> >> >>>>>> >>> >> >>>>>> @Jason >>> >> >>>>>> I did not oppose the idea as I'm not too familiar with Java >>> >> >>> conventions. >>> >> >>>>> I >>> >> >>>>>> agree it is a non-ideal way for the user to catch the >>> exception so >>> >> >> I >>> >> >>>>>> changed it back. >>> >> >>>>>> >>> >> >>>>>> I've updated the KIP and PR with the latest changes. Now, >>> there is >>> >> >>> only >>> >> >>>>> one >>> >> >>>>>> public exception `FaultyRecordException` which is thrown to the >>> >> >> user >>> >> >>> in >>> >> >>>>>> both scenarios (corrupt record and deserialization error). >>> >> >>>>>> Please take a look again. >>> >> >>>>>> >>> >> >>>>>> Best, >>> >> >>>>>> Stanislav >>> >> >>>>>> >>> >> >>>>>> On Thu, Aug 2, 2018 at 5:25 PM Jason Gustafson < >>> ja...@confluent.io >>> >> >>> >>> >> >>>>> wrote: >>> >> >>>>>> >>> >> >>>>>>> Hey Stanislav, >>> >> >>>>>>> >>> >> >>>>>>> I should have noticed it earlier from your example, but I just >>> >> >>> realized >>> >> >>>>>>> that interfaces don't mix well with exceptions. There is no >>> way >>> >> >> to >>> >> >>>>> catch >>> >> >>>>>>> the interface type, which means you have to depend on >>> instanceof >>> >> >>>>> checks, >>> >> >>>>>>> which is not very conventional. At the moment, we raise >>> >> >>>>>>> SerializationException if there is a problem parsing the >>> error, >>> >> >>> and we >>> >> >>>>>>> raise KafkaException if the record fails its CRC check. Since >>> >> >>>>>>> SerializationException extends KafkaExeption, it seems like we >>> >> >> can >>> >> >>>>> handle >>> >> >>>>>>> both cases in a compatible way by handling both cases with a >>> >> >> single >>> >> >>>>> type >>> >> >>>>>>> that extends SerializationException. Maybe something like >>> >> >>>>>>> RecordDeserializationException? >>> >> >>>>>>> >>> >> >>>>>>> Thanks, >>> >> >>>>>>> Jason >>> >> >>>>>>> >>> >> >>>>>>> On Thu, Aug 2, 2018 at 5:45 AM, Ted Yu <yuzhih...@gmail.com> >>> >> >>> wrote: >>> >> >>>>>>> >>> >> >>>>>>>> +1 >>> >> >>>>>>>> -------- Original message --------From: Stanislav Kozlovski < >>> >> >>>>>>>> stanis...@confluent.io> Date: 8/2/18 2:41 AM (GMT-08:00) >>> To: >>> >> >>>>>>>> dev@kafka.apache.org Subject: [VOTE] KIP-334 Include >>> >> >> partitions >>> >> >>> in >>> >> >>>>>>>> exceptions raised during consumer record >>> >> >>> deserialization/validation >>> >> >>>>>>>> Hey everybody, >>> >> >>>>>>>> >>> >> >>>>>>>> I'd like to start a vote thread for KIP-334 Include >>> partitions >>> >> >> in >>> >> >>>>>>>> exceptions raised during consumer record >>> >> >>> deserialization/validation >>> >> >>>>>>>> < >>> >> >>>>>>> >>> >> >>>>> https://cwiki.apache.org/confluence/pages/viewpage.action? >>> >> >>> pageId=87297793 >>> >> >>>>>>>>> >>> >> >>>>>>>> >>> >> >>>>>>>> -- >>> >> >>>>>>>> Best, >>> >> >>>>>>>> Stanislav >>> >> >>>>>>>> >>> >> >>>>>>> >>> >> >>>>>> >>> >> >>>>>> >>> >> >>>>>> -- >>> >> >>>>>> Best, >>> >> >>>>>> Stanislav >>> >> >>>>> >>> >> >>>> >>> >> >>>> >>> >> >>>> -- >>> >> >>>> Best, >>> >> >>>> Stanislav >>> >> >>> >>> >> >> >>> >> > >>> >> > >>> >> >>> >> >>> >> >> >> -- >> Best, >> Stanislav >> > > > -- > Best, > Stanislav > -- Best, Stanislav