Hi Danny,

I'm also leaning slightly towards the single AWS connector repo direction.

Bumps in the underlying AWS SDK would bump all of the connectors in any
case. And if a change occurs that is isolated to a single connector, then
those that do not use that connector can just skip the release.

Cheers,
Thomas


On Mon, Oct 24, 2022 at 3:01 PM Teoh, Hong <lian...@amazon.co.uk.invalid>
wrote:

> I like the single repo with single version idea.
>
> Pros:
> - Better discoverability for connectors for AWS services means a better
> experience for Flink users
> - Natural placement of AWS-related utils (Credentials, SDK Retry strategy)
>
> Caveats:
> - As you mentioned, it is not desirable if we have to evolve the major
> version of the connector just for a change in a single connector (e.g.
> DynamoDB). However, I think it is reasonable to only evolve the major
> version of the AWS connector repo when there are Flink Source/Sink API
> upgrades or AWS SDK major upgrades (probably quire rare). Any new features
> for individual connectors can be collapsed into minor releases.
> - An additional callout here is that we should be careful adopting any AWS
> connectors that don't use the AWS SDK directly (e.g. how the Kinesis
> connector used KPL for a long time). In my opinion, any new connectors like
> that would be better placed in their own repositories, otherwise we will
> have a complex mesh of dependencies to manage.
>
> Regards,
> Hong
>
>
>
>
> On 21/10/2022, 16:59, "Danny Cranmer" <dannycran...@apache.org> wrote:
>
>     CAUTION: This email originated from outside of the organization. Do
> not click links or open attachments unless you can confirm the sender and
> know the content is safe.
>
>
>
>     Thanks Chesnay for the suggestion, I will investigate this option.
>
>     Related to the single repo idea, I have considered it in the past. Are
> you
>     proposing we also use a single version between all connectors? If we
> have a
>     single version then it makes sense to combine them in a single repo, if
>     they are separate versions, then splitting them makes sense. This was
>     discussed last year more generally [1] and the consensus was "we
> ultimately
>     propose to have a single repository per connector".
>
>     Combining all AWS connectors into a single repo with a single version
> is
>     inline with how the AWS SDK works, therefore AWS users are familiar
> with
>     this approach. However it is frustrating that we would have to release
> all
>     connectors to fix a bug or add a feature in one of them. Example: a
> user is
>     using Kinesis Data Streams only (the most popular and mature
> connector),
>     and we evolve the version from 1.x to 2.y (or 1.x to 1.y) for a
> DynamoDB
>     change.
>
>     I am torn and will think some more, but it would be great to hear other
>     people's opinions.
>
>     [1] https://lists.apache.org/thread/bywh947r2f5hfocxq598zhyh06zhksrm
>
>     Thanks,
>     Danny
>
>     On Fri, Oct 21, 2022 at 3:11 PM Jing Ge <j...@ververica.com> wrote:
>
>     > I agree with Jark. It would be easier for the further development and
>     > maintenance, if all aws related connectors and the base module are
> in the
>     > same repo. It might make sense to upgrade the
> flink-connector-dynamodb to
>     > flink-connector-aws and move the other modules including the
>     > flink-connector-aws-base into it. The aws sdk could be managed in
>     > flink-connector-aws-base. Any future common connector features could
> also
>     > be developed in the base module.
>     >
>     > Best regards,
>     > Jing
>     >
>     > On Fri, Oct 21, 2022 at 1:26 PM Jark Wu <imj...@gmail.com> wrote:
>     >
>     >> How about creating a new repository flink-connector-aws and merging
>     >> dynamodb, kinesis firehouse into it?
>     >> This can reduce the maintenance for complex dependencies and make
> the
>     >> release easy.
>     >> I think the maintainers of aws-releated connectors are the same
> people.
>     >>
>     >> Best,
>     >> Jark
>     >>
>     >> > 2022年10月21日 17:41,Chesnay Schepler <ches...@apache.org> 写道:
>     >> >
>     >> > I would not go with 2); I think it'd just be messy .
>     >> >
>     >> > Here's another option:
>     >> >
>     >> > Create another repository (aws-connector-base) (following the
>     >> externalization model), add it as a sub-module to the downstream
>     >> repositories, and make it part of the release process of said
> connector.
>     >> >
>     >> > I.e., we never create a release for aws-connector-bose, but
> release it
>     >> as part of the connector.
>     >> > This main benefit here is that we'd always be able to make
> changes to
>     >> the aws-base code without delaying connector releases.
>     >> > I would assume that any added overhead due to _technically_
> releasing
>     >> the aws code multiple times to be negligible.
>     >> >
>     >> >
>     >> > On 20/10/2022 22:38, Danny Cranmer wrote:
>     >> >> Hello all,
>     >> >>
>     >> >> Currently we have 2 AWS Flink connectors in the main Flink
> codebase
>     >> >> (Kinesis Data Streams and Kinesis Data Firehose) and one new
>     >> externalized
>     >> >> connector in progress (DynamoDB). Currently all three of these
> use
>     >> common
>     >> >> AWS utilities from the flink-connector-aws-base module. Common
> code
>     >> >> includes client builders, property keys, validation, utils etc.
>     >> >>
>     >> >> Once we externalize the connectors, leaving
> flink-connector-aws-base
>     >> in the
>     >> >> main Flink repository will restrict our ability to evolve the
>     >> connectors
>     >> >> quickly. For example, as part of the DynamoDB connector build we
> are
>     >> >> considering adding a general retry strategy config that can be
>     >> leveraged by
>     >> >> all connectors. We would need to block on Flink 1.17 for this.
>     >> >>
>     >> >> In the past we have tried to keep the AWS SDK version consistent
> across
>     >> >> connectors, with the externalization this is more likely to
> diverge.
>     >> >>
>     >> >> Option 1: I propose we create a new repository,
> flink-connector-aws,
>     >> which
>     >> >> we can move the flink-connector-aws-base module to and create a
> new
>     >> >> flink-connector-aws-parent to manage SDK versions. Each of the
>     >> externalized
>     >> >> AWS connectors will depend on this new module and parent.
> Downside is
>     >> an
>     >> >> additional module to release per Flink version, however I will
>     >> volunteer to
>     >> >> manage this.
>     >> >>
>     >> >> Option 2: We can move the flink-connector-aws-base module and
> create
>     >> >> flink-connector-parent within the flink-connector-shared-utils
> repo [2]
>     >> >>
>     >> >> Option 3: We do nothing.
>     >> >>
>     >> >> For option 1+2 we will follow the general externalized connector
>     >> versioning
>     >> >> strategy and rules.
>     >> >>
>     >> >> I am inclined towards option 1, and appreciate feedback from the
>     >> community.
>     >> >>
>     >> >> [1]
>     >> >>
>     >>
> https://github.com/apache/flink/tree/master/flink-connectors/flink-connector-aws-base
>     >> >> [2] https://github.com/apache/flink-connector-shared-utils
>     >> >>
>     >> >> Thanks,
>     >> >> Danny
>     >> >>
>     >> >
>     >>
>     >>
>
>

Reply via email to