Best,
Dawid On 27/10/2022 08:33, Yun Gao wrote:
Hi, I have discussed offline with Hangxiang and Yuan, and the current proposal also looks good to me. Thanks Hangxiang for driving this. Best, Yun Gao ------------------------------------------------------------------ From:Yuan Mei <yuanmei.w...@gmail.com> Send Time:2022 Oct. 26 (Wed.) 12:20 To:dev <dev@flink.apache.org> Subject:Re: [DISCUSS] FLIP-263: Improve resolving schema compatibility Hey Huangxiang, The section of `Rejected Alternatives` may also need an update. Current plan sounds like a reasonable one. I am fine with it. Thanks for driving this. Best Yuan On Tue, Oct 25, 2022 at 5:11 PM Hangxiang Yu <master...@gmail.com> wrote:(Resend the mail to fix the format issue) Hi, everyone. Thanks for your suggestions! Let me summarize the remaining questions in the thread and share my ideas based on your suggestions: 1. Should we put the new opposite interface in TypeSerializer or TypeSerializerSnapshot ? Just as I replied to Dawid, I'd like to put it in TypeSerializerSnapshot so that we could still follow the contract between two classes and make later code migration easier based on current tools. Thanks Dawid for the initial suggestion and Godfrey for the additional supplement. 2. Do we just break changes and make user codes incompatible or make sure compatibility using a more suitable migration plan ? I agree with Yuan that we should make sure that user jobs still work without modifying any codes before removing the deprecated method. Thanks Yuan for the migration plan. Let me try to add something to the suggestion of Yuan: a. In Step 1, I prefer to make the new interface like: default TypeSerializerSchemaCompatibility<T> resolveSchemaCompatibility( // Use 'oldSnapshot' not 'oldSerializer' TypeSerializerSnapshot<T> oldSnapshot) { return INCOMPATIBLE; } } I think using 'oldSnapshot' as the parameter will make the logic clear --- TypeSerializerSnapshot will take all responsibility for compatibility checks. BTW, It's also easy to migrate original check logic to this interface. b. In Step 1, In addition to introducing default implementations for both interfaces, we also need to implement the new interface in all inner TypeSerializerSnapshots. Users may implement their own serializers based on inner serializers, we should make sure that the new interface of inner TypeSerializerSnapshots is usable. Then I think it could work for both old custom serializers or new custom serializers. No matter which interface the user implements, it could always work. Of course, we will deprecate the old interface and encourage users to use the new one. 3. Do we need to squash this with https://lists.apache.org/thread/v1q28zg5jhxcqrpq67pyv291nznd3n0w <https://lists.apache.org/thread/v1q28zg5jhxcqrpq67pyv291nznd3n0w > ? We will not break the compatibility based on 2, so it's not necessary to squash them together. Do you have any other suggestions ? Look forward to your reply! On Tue, Oct 25, 2022 at 1:31 PM Hangxiang Yu <master...@gmail.com> wrote:Hi, everyone. Thanks for your suggestions! Let me summarize the remaining questions in the thread and share my ideas based on your suggestions: 1. Should we put the new opposite interface in TypeSerializer or TypeSerializerSnapshot ? Just as I replied to Dawid, I'd like to put it in TypeSerializerSnapshot so that we could still follow the contract between two classes and make later code migration easier based on current tools. Thanks Dawid for the initial suggestion and Godfrey for the additional supplement. 1. Do we just break changes and make user codes incompatible or make sure compatibility using a more suitable migration plan ? I agree with Yuan that we should make sure that user jobs still work without modifying any codes before removing the deprecated method. Thanks Yuan for the migration plan. Let me try to add something to the suggestion of Yuan: 1. In Step 1, I prefer to make the new interface like: default TypeSerializerSchemaCompatibility<T> resolveSchemaCompatibility( // Use 'oldSnapshot' not 'oldSerializer' TypeSerializerSnapshot<T> oldSnapshot) { return INCOMPATIBLE; } I think using 'oldSnapshot' as the parameter will make the logic clear---TypeSerializerSnapshot will take all responsibility for compatibility checks. BTW, It's also easy to migrate original check logic to this interface. 1. In Step 1, In addition to introducing default implementations for both interfaces, we also need to implement the new interface inall innerTypeSerializerSnapshots. Users may implement their own serializers based on inner serializers, we should make sure that the new interface of inner TypeSerializerSnapshotsisusable. Then I think it could work for both old custom serializers or new custom serializers. No matter which interface the user implements, it could always work. Of course, we will deprecate the old interface and encourage users to use the new one. 1. Do we need to squash this with https://lists.apache.org/thread/v1q28zg5jhxcqrpq67pyv291nznd3n0w <https://lists.apache.org/thread/v1q28zg5jhxcqrpq67pyv291nznd3n0w > ? We will not break the compatibility based on 2, so it's not necessary to squash them together. Do you have any other suggestions ? Look forward to your reply! On Mon, Oct 24, 2022 at 8:30 PM Yuan Mei <yuanmei.w...@gmail.com> wrote:Hey Hangxiang, Thanks for driving this issue. I've read through all the discussions and suggestions in this thread, and here is my take: 1. I agree that the compatibility check should be done in the opposite direction. The current interface *causes some real issues* for users using their own `TypeSerializer`, and we should help resolve this issue. 2. The new opposite interface would better still stay within the class `TypeSerializerSnapshot`. I am suggesting this mostly considering the designed contractbetween`TypeSerializerSnapshot` and `TypeSerializer`. I do not think it is necessary to break the current contract because of this change. 3. It seems to me that "breaking changes" are unavoidable in each of the strategies proposed. However, *I'd be very concerned and worried *to make a breaking change *in the very first step*. A more user-friendly way is to make sure of compatibility and mark the method to drop as `@Deprecated`. After several versions of adoption, carefully and publicly discussing with users to drop the method. By compatibilities, I mean the following things: 1). upgrading from a lower Flink version to the new version does not need the user to do extra work for the change. 2). In the new version, both the old and the new method should be supported, and of course, we would encourage users to use the newmethod.For example, dropping "TypeSerializerConfigSnapshot" started to take into action after almost 10 releases the time it was marked deprecated. Due to the above reason, I am hesitant to squash this FLIP with https://lists.apache.org/thread/v1q28zg5jhxcqrpq67pyv291nznd3n0w <https://lists.apache.org/thread/v1q28zg5jhxcqrpq67pyv291nznd3n0w > 4. What do I suggest doing? *STEP 1: In `TypeSerializerSnapshot`, we do something like this, and keep everything else exactly the same* @Deprecated default TypeSerializerSchemaCompatibility<T> resolveSchemaCompatibility( TypeSerializer<T> newSerializer) { // use new method return newSerializer.snapshotConfiguration(). resolveSerializerSchemaCompatibility(this.restoreSerializer()); } default TypeSerializerSchemaCompatibility<T> resolveSerializerSchemaCompatibility( TypeSerializer<T> oldSerializer) { return INCOMPATIBLE; } *STEP 2: (This is a breaking change), after several releases* 1. Remove the deprecated method 2. Substitute every usage with the new method 3. Make the new method not have a default implementation STEP1 is compatible, and STEP2 is a breaking change. Best Yuan On Fri, Oct 21, 2022 at 10:01 AM godfrey he <godfre...@gmail.com>wrote:Hi Hangxiang, Dawid, I also prefer to add method into TypeSerializerSnapshot, which looks more natural. TypeSerializerSnapshot has `Version` concept, which also can be used for compatibility. ` TypeSerializerSnapshot { TypeSerializerSchemaCompatibility<T>resolveSchemaCompatibility(TypeSerializerSnapshot<T> oldSnapshot); } ` Best, Godfrey Dawid Wysakowicz <dwysakow...@apache.org> 于2022年10月20日周四 16:59写道:That's the final status we will arrive at. IIUC, we cannot just remove the original method but just mark it asdeprecated so two methods have to exist together in the short term.Users may also need to migrate their own external serializers whichisa long run.I'd like to make jobs where users could always work without modifyingany codes before removing the deprecated method so I provide a default implementation in the proposal.I understand it works for old serializers, but it makes writing newserializers more complicated. It's not as simple as extending aninterfaceand implementing all the methods it tells you to. You have to dig into documentation and check that you must implement one of the methods(eitherthe old or the new one). Otherwise you end up in an ifinite loop.If we break the compatibility, yes we cause some annoyance, but Ifeelit's a safer option. BTW, there is a proposal around serializer thatalsocauses some incompatibilities so we could squash this together: https://lists.apache.org/thread/v1q28zg5jhxcqrpq67pyv291nznd3n0w <https://lists.apache.org/thread/v1q28zg5jhxcqrpq67pyv291nznd3n0w >I don't want to force that, but would be really happy to hear whatothers think.Best, Dawid On 19/10/2022 04:05, Hangxiang Yu wrote: Hi, Dawid. Thanks for your reply. Should we introduce the new method to the TypeSerializerSnapshotinstead?You provided a valuable option, let me try to list the benefit andcost.benefit: TypeSerializerSnapshot still owns the responsibility of resolvingschema compatibility so TypeSerializer could just pay attention to its serialization and deserialization as before.It's very convenient to implement it based on current implementationby all information in TypeSerializerSnapshot and tools which is also helpful for users to migrate their external serializer.cost: The interface is a bit strange because we have to construct and usethe snapshot of the new serializer to check the compatibility.I appreciate this because users could migrate more conveniently. For its cost, We could see TypeSerializerSnapshot as a point-in-timeview of TypeSerializer used for forward compatibility checks andpersistingserializer within checkpoints. It seems reasonable to generate a point-in-time view of the serializer when we need to resolvecompatibility.I'd actually be fine with breaking some external serializers and notprovide a default implementation of the new method.That's the final status we will arrive at. IIUC, we cannot just remove the original method but just mark it asdeprecated so two methods have to exist together in the short term.Users may also need to migrate their own external serializers whichisa long run.I'd like to make jobs where users could always work without modifyingany codes before removing the deprecated method so I provide a default implementation in the proposal.On Mon, Oct 17, 2022 at 10:10 PM Dawid Wysakowicz <dwysakow...@apache.org> wrote:Hi Han, I think in principle your proposal makes sense and the compatibility check indeed should be done in the opposite direction. However, I have two suggestions: 1. Should we introduce the new method to the TypeSerializerSnapshot instead? E.g. TypeSerializerSnapshot { TypeSerializerSchemaCompatibility<T>resolveSchemaCompatibility(TypeSerializerSnapshot<T> oldSnapshot); } I know it has the downside that we'd need to create a snapshot forthenew serializer during a restore, but the there is a lot of toolingandsubclasses in the *Snapshot stack that it won't be possible tomigrateto TypeSerializer stack. I have classes such as CompositeTypeSerializerSnapshot and alike in mind. 2. I'd actually be fine with breaking some external serializers andnotprovide a default implementation of the new method. If we don't dothatwe will have two methods with default implementation which call each other. It makes it also a bit harder to figure out which methods are mandatory to be implemented going forward. What do you think? Best, Dawid On 17/10/2022 04:43, Hangxiang Yu wrote:Hi, Han. Both the old method and the new method can get previous and newinnerinformation. The new serializer will decide it just like the old serializer didbefore.The method just specify the schema compatibility result so thatotherbehaviours is same as before. On Fri, Oct 14, 2022 at 11:40 AM Han Yin <alexyin...@gmail.com>wrote:Hi Hangxiang, Thanks for the proposal. It seems more reasonable to let the new serializer claim the compatibility in the cases you mentioned. I have but one question here. What happens in the case of “compatibleAfterMigration” after we completely reverse thedirection (instep 3)? To be specific, migration from an old schema calls fortheprevious serializer to read bytes into state objects. How shouldanewserializer decide whether the migration is possible? Best, Han On 2022/10/12 12:41:07 Hangxiang Yu wrote:Dear Flink developers, I would like to start a discussion thread on FLIP-263[1]proposing toimprove the usability of resolving schema compatibility. Currently, the place for compatibility checks is TypeSerializerSnapshot#resolveSchemaCompatibility which belongs to the old serializer, There are no ways for userstospecify thecompatibility with the old serializer in the new customizedserializer.The FLIP hopes to reverse the direction of resolving schemacompatibilityto improve the usability of resolving schema compatibility. [1]https://cwiki.apache.org/confluence/display/FLINK/FLIP-263%3A+Improve+resolving+schema+compatibility <https://cwiki.apache.org/confluence/display/FLINK/FLIP-263%3A+Improve+resolving+schema+compatibility >-- Best, Hangxiang.-- Best, Hangxiang.-- Best, Hangxiang.
OpenPGP_0x31D2DD10BFC15A2D.asc
Description: OpenPGP public key
OpenPGP_signature
Description: OpenPGP digital signature