Hi Dong Lin, I see where you are coming from. I agree it would be good to have a FLIP to design the public API for flink-avro, so we will limit this vote thread to the externalization.
Regards, Hong From: Dong Lin <lindon...@gmail.com> Date: Tuesday, 31 January 2023 at 14:26 To: "dev@flink.apache.org" <dev@flink.apache.org> Cc: "Teoh, Hong" <lian...@amazon.co.uk>, Chesnay Schepler <ches...@apache.org> Subject: RE: [EXTERNAL][VOTE] Externalize AWS formats to flink-connector-aws CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. Hi all, It seems that we are on the same page that having flink-connector-aws use non-public API from apache/flink is not a blocking issue here. And I agree that it would be nice to make necessary APIs public so that 1) projects outside Flink can also develop similar connectors and 2) flink-connector-aws can likely have less issues dealing with changes in apache/flink. I just want to note that, if we want to additionally mark a few classes in flink-avro as @PublicEvolving, since this is a major public API change, it is probably necessary to create a FLIP to specify these API changes. Also, we might need to open a new voting thread since the votes made earlier in this thread do not cover those public API change. Thanks, Dong On Tue, Jan 31, 2023 at 4:02 PM Danny Cranmer <dannycran...@apache.org<mailto:dannycran...@apache.org>> wrote: Hello all, @Dong Lin while I agree with your reasoning here, our ultimate aim is to have stable public APIs. If we need to use @Internal apis for the Apache connectors/formats/etc, then it is likely that users building custom integrations do too. Therefore I am aligned with @Chesnay that we should not depend on @Internal classes. But the same issue applies in the main Flink codebase and externalizing code does not necessarily need to be a blocker, since it is an existing problem. We can consider this a best practice. > It seems that there are roughly 32 public classes in flink-avro that are not marked with @PublicEvolving. Some of them are explicitly marked as @Internal. Are you suggesting to mark all these classes as @PublicEvolving? I interpreted @Teoh, Hong <lian...@amazon.co.uk<mailto:lian...@amazon.co.uk>>'s message as we have identified *three* classes that are currently used by the formats that are insufficiently scoped that we should promote to @PublicEvolving. Thanks for the deep dive @Teoh, Hong <lian...@amazon.co.uk<mailto:lian...@amazon.co.uk>>. > As such, I propose that we: - Mark the inherently public classes in flink-avro as @PublicEvolving - Proceed with externalizing flink-avro-glue-schema-registry +1 to this proposal Thanks, Danny On Tue, Jan 31, 2023 at 12:32 AM Dong Lin <lindon...@gmail.com<mailto:lindon...@gmail.com>> wrote: > Hi Hong, > > Thanks for the detailed reply. > > It seems that there are roughly 32 public classes in flink-avro that are > not marked with @PublicEvolving. Some of them are explicitly marked > as @Internal. Are you suggesting to mark all these classes as > @PublicEvolving? > > I agree it would be nice to mark more classes as @PublicEvolving so that > projects outside apache/flink* can take advantage of these APIs. On the > other hand, it is probably useful to make sure that we only expose the > minimal set of APIs for our target use cases and that all public APIs are > well-documented. > > Typically public APIs are exposed via interface rather than concrete class. > So it is not clear to me whether we can just mark all those classes > as @PublicEvolving and still achieve the high-level goal described above. > Maybe we should follow the typical FLIP process by listing all APIs that > need to be exposed as public in a FLIP so that other developers will help > double check those APIs and vote for it, if we decide to add more public > APIs in Flink. > > On the other hand, I still think it is OK for flink-connector-aws (or any > projects managed by Flink community) to use Internal APIs, for the reasons > described in my reply to Chesnay's email. We can still try to make more > APIs public. I just don't think it has to block the effort to > externalize AWS formats. > > What do you think? > > Regards, > Dong > > > > On Tue, Jan 31, 2023 at 2:18 AM Teoh, Hong > <lian...@amazon.co.uk<mailto:lian...@amazon.co.uk>> wrote: > > > Hi @Chesnay @Dong Lin, > > > > Thanks for flagging up the concerns. > > > > I agree that we should make clear which APIs are public in flink-avro so > > that we can make the connector code more independent. IMO that moves in > the > > right direction for Flink in general! > > > > > > 1. Public classes/interfaces in flink-avro > > - I took a look at the classes in flink-avro that are used in > > flink-avro-glue-schema-registry, and it turns out those classes are > already > > exposed as part of the public interface in > flink-avro-glue-schema-registry > > and flink-avro-confluent-registry. (They are already inherently > > @PublicEvolving (RegistryAvroDeserializationSchema, SchemaCoder, > > RegistryAvroSerializationSchema), so I recommend that we should just mark > > them as so for clarity. > > - The only class that isn't inherently exposed as @PublicEvolving is > > MutableByteArrayInputStream, but I took a look, and I think we can remove > > this dependency from flink-avro-glue-schema-registry > > - Also, while looking, I found that there are a couple of classes that > are > > recommended for users in our doc [1] but are not marked as > @PublicEvolving > > . We can probably also fix this by adding the @PublicEvolving annotation > > > > 2. Our public API depends on Apache Avro classes > > - I agree, ideally we don't have this dependency, but since we already > > inherently expose it to users, we should maintain this for backwards > > compatibility. > > - Also, some of the @PublicEvolving classes in flink-avro already have > > dependencies on Avro classes [2] > > > > As such, I propose that we: > > - Mark the inherently public classes in flink-avro as @PublicEvolving > > - Proceed with externalizing flink-avro-glue-schema-registry > > > > > > Let me know what you think! > > > > Regards, > > Hong > > > > [1] AvroInputFormat > > > https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/datastream/formats/avro/ > > [2] AvroFormatOptions > > > https://github.com/apache/flink/blob/f75cf38e28143bb36acbc2ee6a5ea9c8852916dd/flink-formats/flink-avro/src/main/java/org/apache/flink/formats/avro/AvroFormatOptions.java > > > > > > On 30/01/2023, 12:53, "Chesnay Schepler" > > <ches...@apache.org<mailto:ches...@apache.org>> wrote: > > > > CAUTION: This email originated from outside of the organization. Do > > not click links or open attachments unless you can confirm the sender and > > know the content is safe. > > > > > > > > On 30/01/2023 13:24, Dong Lin wrote: > > > it should be OK for > > > apache/flink-connector-aws to use non-public API, similar to how > > classes > > > inside Flink can use APIs marked with @Internal. > > > > > > We can mark related classes with @Internal as appropriate, without > > > requiring any new FLIP. > > > > It's not though, because these APIs don't offer any compatibility > > guarantees. > > > > We want to arrive at a state where connectors are fully independent > of > > Flink releases. Using non-public APIs runs counter to that. > > > > > > >