I see the majority of the community prefers the not immediate breaking of the API. Even though I have different preference I am more than fine to commit to the others choice. Just please make sure it is prominently documented that one or the other methods MUST be implemented in any new serializer.

Best,

Dawid

On 27/10/2022 08:33, Yun Gao wrote:
Hi,
I have discussed offline with Hangxiang and Yuan, and the current
proposal also looks good to me.
Thanks Hangxiang for driving this.
Best,
Yun Gao
------------------------------------------------------------------
From:Yuan Mei <yuanmei.w...@gmail.com>
Send Time:2022 Oct. 26 (Wed.) 12:20
To:dev <dev@flink.apache.org>
Subject:Re: [DISCUSS] FLIP-263: Improve resolving schema compatibility
Hey Huangxiang,
The section of `Rejected Alternatives` may also need an update.
Current plan sounds like a reasonable one. I am fine with it.
Thanks for driving this.
Best
Yuan
On Tue, Oct 25, 2022 at 5:11 PM Hangxiang Yu <master...@gmail.com> wrote:
(Resend the mail to fix the format issue)
Hi, everyone.

Thanks for your suggestions!

Let me summarize the remaining questions in the thread and share my ideas
based on your suggestions:


1. Should we put the new opposite interface in TypeSerializer or
TypeSerializerSnapshot ?

Just as I replied to Dawid, I'd like to put it in
TypeSerializerSnapshot so that we could still follow the contract between
two classes and make later code migration easier based on current tools.

Thanks Dawid for the initial suggestion and Godfrey for the additional
supplement.



2. Do we just break changes and make user codes incompatible or make sure
compatibility using a more suitable migration plan ?

I agree with Yuan that we should make sure that user jobs still work
without modifying any codes before removing the deprecated method.

Thanks Yuan for the migration plan. Let me try to add something to the
suggestion of Yuan:


a. In Step 1, I prefer to make the new interface like:


default TypeSerializerSchemaCompatibility<T>
resolveSchemaCompatibility(
// Use 'oldSnapshot' not 'oldSerializer'
TypeSerializerSnapshot<T> oldSnapshot) {
return INCOMPATIBLE;
}
}

I think using 'oldSnapshot' as the parameter will make the
logic clear --- TypeSerializerSnapshot will take all responsibility for
compatibility checks.
BTW, It's also easy to migrate original check logic to this
interface.

b. In Step 1, In addition to introducing default implementations
for both interfaces, we also need to implement the new interface in all
inner TypeSerializerSnapshots.
Users may implement their own serializers based on inner
serializers, we should make sure that the new interface of inner
TypeSerializerSnapshots is usable.

Then I think it could work for both old custom serializers or new
custom serializers.
No matter which interface the user implements, it could always work.
Of course, we will deprecate the old interface and encourage users to
use the new one.


3. Do we need to squash this with
https://lists.apache.org/thread/v1q28zg5jhxcqrpq67pyv291nznd3n0w 
<https://lists.apache.org/thread/v1q28zg5jhxcqrpq67pyv291nznd3n0w > ?
We will not break the compatibility based on 2, so it's not necessary
to squash them together.


Do you have any other suggestions ? Look forward to your reply!

On Tue, Oct 25, 2022 at 1:31 PM Hangxiang Yu <master...@gmail.com> wrote:

Hi, everyone.

Thanks for your suggestions!

Let me summarize the remaining questions in the thread and share my ideas
based on your suggestions:

1. Should we put the new opposite interface in TypeSerializer or
TypeSerializerSnapshot ?

Just as I replied to Dawid, I'd like to put it in TypeSerializerSnapshot
so that we could still follow the contract between two classes and make
later code migration easier based on current tools.

Thanks Dawid for the initial suggestion and Godfrey for the additional
supplement.



1. Do we just break changes and make user codes incompatible or make
sure compatibility using a more suitable migration plan ?

I agree with Yuan that we should make sure that user jobs still work
without modifying any codes before removing the deprecated method.

Thanks Yuan for the migration plan. Let me try to add something to the
suggestion of Yuan:

1. In Step 1, I prefer to make the new interface like:

default TypeSerializerSchemaCompatibility<T> resolveSchemaCompatibility(

// Use 'oldSnapshot' not 'oldSerializer'
TypeSerializerSnapshot<T> oldSnapshot) {

return INCOMPATIBLE;

}

I think using 'oldSnapshot' as the parameter will make the logic clear
---
TypeSerializerSnapshot will take all responsibility for compatibility
checks.

BTW, It's also easy to migrate original check logic to this interface.

1. In Step 1, In addition to introducing default implementations for
both interfaces, we also need to implement the new interface in
all inner
TypeSerializerSnapshots.

Users may implement their own serializers based on inner serializers, we
should make sure that the new interface of inner TypeSerializerSnapshots
is
usable.


Then I think it could work for both old custom serializers or new custom
serializers.

No matter which interface the user implements, it could always work.

Of course, we will deprecate the old interface and encourage users to use
the new one.



1. Do we need to squash this with
https://lists.apache.org/thread/v1q28zg5jhxcqrpq67pyv291nznd3n0w 
<https://lists.apache.org/thread/v1q28zg5jhxcqrpq67pyv291nznd3n0w > ?

We will not break the compatibility based on 2, so it's not necessary to
squash them together.


Do you have any other suggestions ? Look forward to your reply!

On Mon, Oct 24, 2022 at 8:30 PM Yuan Mei <yuanmei.w...@gmail.com> wrote:

Hey Hangxiang,

Thanks for driving this issue. I've read through all the discussions and
suggestions in this thread, and here is my take:

1. I agree that the compatibility check should be done in the opposite
direction.
The current interface *causes some real issues* for users using
their own `TypeSerializer`, and we should help resolve this issue.

2. The new opposite interface would better still stay within the class
`TypeSerializerSnapshot`.
I am suggesting this mostly considering the designed contract
between
`TypeSerializerSnapshot` and `TypeSerializer`. I do not think it is
necessary to break the current contract because of this change.

3. It seems to me that "breaking changes" are unavoidable in each of the
strategies proposed. However, *I'd be very concerned and worried *to
make a breaking change *in the very first step*.
A more user-friendly way is to make sure of compatibility and mark
the method to drop as `@Deprecated`. After several versions of adoption,
carefully and publicly discussing with users to drop the method. By
compatibilities, I mean the following things:
1). upgrading from a lower Flink version to the new version does not
need the user to do extra work for the change.
2). In the new version, both the old and the new method should be
supported, and of course, we would encourage users to use the new
method.
For example, dropping "TypeSerializerConfigSnapshot" started to take
into action after almost 10 releases the time it was marked deprecated.

Due to the above reason, I am hesitant to squash this FLIP with
https://lists.apache.org/thread/v1q28zg5jhxcqrpq67pyv291nznd3n0w 
<https://lists.apache.org/thread/v1q28zg5jhxcqrpq67pyv291nznd3n0w >

4. What do I suggest doing?

*STEP 1: In `TypeSerializerSnapshot`, we do something like this, and
keep everything else exactly the same*

@Deprecated
default TypeSerializerSchemaCompatibility<T>
resolveSchemaCompatibility(
TypeSerializer<T> newSerializer) {
// use new method
return newSerializer.snapshotConfiguration().

resolveSerializerSchemaCompatibility(this.restoreSerializer());

}

default TypeSerializerSchemaCompatibility<T>
resolveSerializerSchemaCompatibility(
TypeSerializer<T> oldSerializer) {
return INCOMPATIBLE;
}

*STEP 2: (This is a breaking change), after several releases*
1. Remove the deprecated method
2. Substitute every usage with the new method
3. Make the new method not have a default implementation

STEP1 is compatible, and STEP2 is a breaking change.

Best
Yuan



On Fri, Oct 21, 2022 at 10:01 AM godfrey he <godfre...@gmail.com>
wrote:
Hi Hangxiang, Dawid,

I also prefer to add method into TypeSerializerSnapshot, which looks
more natural. TypeSerializerSnapshot has `Version` concept, which also
can be used for compatibility.
`
TypeSerializerSnapshot {

TypeSerializerSchemaCompatibility<T>
resolveSchemaCompatibility(
TypeSerializerSnapshot<T> oldSnapshot);

}
`

Best,
Godfrey

Dawid Wysakowicz <dwysakow...@apache.org> 于2022年10月20日周四 16:59写道:
That's the final status we will arrive at.

IIUC, we cannot just remove the original method but just mark it as
deprecated so two methods have to exist together in the short term.
Users may also need to migrate their own external serializers which
is
a long run.
I'd like to make jobs where users could always work without modifying
any codes before removing the deprecated method so I provide a default
implementation in the proposal.

I understand it works for old serializers, but it makes writing new
serializers more complicated. It's not as simple as extending an
interface
and implementing all the methods it tells you to. You have to dig into
documentation and check that you must implement one of the methods
(either
the old or the new one). Otherwise you end up in an ifinite loop.

If we break the compatibility, yes we cause some annoyance, but I
feel
it's a safer option. BTW, there is a proposal around serializer that
also
causes some incompatibilities so we could squash this together:
https://lists.apache.org/thread/v1q28zg5jhxcqrpq67pyv291nznd3n0w 
<https://lists.apache.org/thread/v1q28zg5jhxcqrpq67pyv291nznd3n0w >

I don't want to force that, but would be really happy to hear what
others think.

Best,

Dawid

On 19/10/2022 04:05, Hangxiang Yu wrote:

Hi, Dawid.


Thanks for your reply.

Should we introduce the new method to the TypeSerializerSnapshot
instead?
You provided a valuable option, let me try to list the benefit and
cost.
benefit:

TypeSerializerSnapshot still owns the responsibility of resolving
schema compatibility so TypeSerializer could just pay attention to its
serialization and deserialization as before.
It's very convenient to implement it based on current implementation
by all information in TypeSerializerSnapshot and tools which is also
helpful for users to migrate their external serializer.
cost:

The interface is a bit strange because we have to construct and use
the snapshot of the new serializer to check the compatibility.
I appreciate this because users could migrate more conveniently.

For its cost, We could see TypeSerializerSnapshot as a point-in-time
view of TypeSerializer used for forward compatibility checks and
persisting
serializer within checkpoints. It seems reasonable to generate a
point-in-time view of the serializer when we need to resolve
compatibility.

I'd actually be fine with breaking some external serializers and not
provide a default implementation of the new method.
That's the final status we will arrive at.

IIUC, we cannot just remove the original method but just mark it as
deprecated so two methods have to exist together in the short term.
Users may also need to migrate their own external serializers which
is
a long run.
I'd like to make jobs where users could always work without modifying
any codes before removing the deprecated method so I provide a default
implementation in the proposal.

On Mon, Oct 17, 2022 at 10:10 PM Dawid Wysakowicz <
dwysakow...@apache.org> wrote:
Hi Han,

I think in principle your proposal makes sense and the compatibility
check indeed should be done in the opposite direction.

However, I have two suggestions:

1. Should we introduce the new method to the TypeSerializerSnapshot
instead? E.g.

TypeSerializerSnapshot {

TypeSerializerSchemaCompatibility<T>
resolveSchemaCompatibility(
TypeSerializerSnapshot<T> oldSnapshot);

}

I know it has the downside that we'd need to create a snapshot for
the
new serializer during a restore, but the there is a lot of tooling
and
subclasses in the *Snapshot stack that it won't be possible to
migrate
to TypeSerializer stack. I have classes such as
CompositeTypeSerializerSnapshot and alike in mind.

2. I'd actually be fine with breaking some external serializers and
not
provide a default implementation of the new method. If we don't do
that
we will have two methods with default implementation which call each
other. It makes it also a bit harder to figure out which methods are
mandatory to be implemented going forward.

What do you think?

Best,

Dawid

On 17/10/2022 04:43, Hangxiang Yu wrote:
Hi, Han.

Both the old method and the new method can get previous and new
inner
information.

The new serializer will decide it just like the old serializer did
before.
The method just specify the schema compatibility result so that
other
behaviours is same as before.

On Fri, Oct 14, 2022 at 11:40 AM Han Yin <alexyin...@gmail.com>
wrote:
Hi Hangxiang,

Thanks for the proposal. It seems more reasonable to let the new
serializer claim the compatibility in the cases you mentioned.

I have but one question here. What happens in the case of
“compatibleAfterMigration” after we completely reverse the
direction (in
step 3)? To be specific, migration from an old schema calls for
the
previous serializer to read bytes into state objects. How should
a
new
serializer decide whether the migration is possible?

Best,
Han

On 2022/10/12 12:41:07 Hangxiang Yu wrote:
Dear Flink developers,

I would like to start a discussion thread on FLIP-263[1]
proposing to
improve the usability of resolving schema compatibility.

Currently, the place for compatibility checks is
TypeSerializerSnapshot#resolveSchemaCompatibility
which belongs to the old serializer, There are no ways for users
to
specify the
compatibility with the old serializer in the new customized
serializer.
The FLIP hopes to reverse the direction of resolving schema
compatibility
to improve the usability of resolving schema compatibility.

[1]

https://cwiki.apache.org/confluence/display/FLINK/FLIP-263%3A+Improve+resolving+schema+compatibility
 
<https://cwiki.apache.org/confluence/display/FLINK/FLIP-263%3A+Improve+resolving+schema+compatibility
 >



--
Best,
Hangxiang.
--
Best,
Hangxiang.


--
Best,
Hangxiang.

Attachment: OpenPGP_0x31D2DD10BFC15A2D.asc
Description: OpenPGP public key

Attachment: OpenPGP_signature
Description: OpenPGP digital signature

Reply via email to