Hi,

I have to agree with what Huang and Jingsong said. We should think more
about it from the user's(developers who use the API) perspective. The first
method T deserialize(byte[]) is convenient for users to deserialize a
single event. Many formats are using it, i.e. Avro, csv, etc. There should
also be Json cases for single event deserialization, if I am not mistaken.

void deserialize(byte[], Collector<T>) method is a default interface
method. There will be a big code refactoring if we use it to replace T
deserialize(byte[]). The benefit is very limited after considering all
concerns.

For few cases that do not need T deserialize(byte[]), the
maintenance effort is IMHO acceptable. It is, after all, only one method.

Best regards,
Jing

On Tue, Feb 28, 2023 at 9:47 AM Benchao Li <libenc...@apache.org> wrote:

> I share the same concerns with Jingsong and Hang, however, I'll raise a
> point why keeping both is also not a good idea.
>
> In FLINK-18590[1], we are introducing a feature that we'll deserialize JSON
> array into multiple records. This feature can only be used in `void
> deserialize(byte[] message, Collector<T> out)`. And many more cdc formats
> are doing the similar thing. If we keep both methods, many formats/features
> will not be available for `T deserialize(byte[] message)`.
>
> And for format maintenance, we usually need to keep these two methods,
> which is also a burden for format maintainers.
>
> [1] https://issues.apache.org/jira/browse/FLINK-18590
>
> Jingsong Li <jingsongl...@gmail.com> 于2023年2月28日周二 16:03写道:
>
> > - `T deserialize(byte[] message)` is widely used and it is a public
> > api. It is very friendly for single record deserializers.
> > - `void deserialize(byte[] message, Collector<T> out)` supports
> > multiple records.
> >
> > I think we can just keep them as they are.
> >
> > Best,
> > Jingsong
> >
> >
> > On Tue, Feb 28, 2023 at 3:08 PM Hang Ruan <ruanhang1...@gmail.com>
> wrote:
> > >
> > > Hi, Shammon,
> > >
> > > I think the method `void deserialize(byte[] message, Collector<T> out)`
> > > with a default implementation encapsulate how to deal with null for
> > > developers. If we remove the `T deserialize(byte[] message)`, the
> > > developers have to remember to handle null. Maybe we will get duplicate
> > > code among them.
> > > And I find there are only 5 implementations override the method `void
> > > deserialize(byte[] message, Collector<T> out)`. Other implementations
> > reuse
> > > the same code to handle null.
> > > I don't know the benefits of removing this method. Looking forward to
> > other
> > > people's opinions.
> > >
> > > Best,
> > > Hang
> > >
> > >
> > >
> > > Shammon FY <zjur...@gmail.com> 于2023年2月28日周二 14:14写道:
> > >
> > > > Hi devs
> > > >
> > > > Currently there are two deserialization methods in
> > `DeserializationSchema`
> > > > 1. `T deserialize(byte[] message)`, only deserialize one record from
> > > > binary, if there is no record it should return null.
> > > > 2. `void deserialize(byte[] message, Collector<T> out)`, supports
> > > > deserializing none, one or multiple records gracefully, it can
> > completely
> > > > replace method `T deserialize(byte[] message)`.
> > > >
> > > > The deserialization logic in the above two methods is basically
> > coincident,
> > > > we recommend users use the second method to deserialize data. To
> > improve
> > > > code maintainability, I'd like to mark the first function as
> > `@Deprecated`,
> > > > and remove it when it is no longer used in the future.
> > > >
> > > > I have created an issue[1] to track it, looking forward to your
> > feedback,
> > > > thanks
> > > >
> > > > [1] https://issues.apache.org/jira/browse/FLINK-31251
> > > >
> > > >
> > > > Best,
> > > > Shammon
> > > >
> >
>
>
> --
>
> Best,
> Benchao Li
>

Reply via email to