Re: [DISCUSS] FLIP-307: Flink connector Redshift

Danny Cranmer Mon, 18 Sep 2023 08:25:16 -0700

Thanks Samrat, this makes sense to me and addresses connector coupling
concerns.


On Sat, Sep 16, 2023 at 4:30 PM Samrat Deb <decordea...@gmail.com> wrote:

> Hi ,
>
> I've made updates to the FLIP[1] by incorporating relevant changes to avoid
> using the Flink connector JDBC. This decision was based on the following
> reasons:
>
> AWS Redshift utilizes its specialized JDBC driver[2]. Given that their JDBC
> driver may undergo evolutions over time, using the Flink connector JDBC
> might face compatibility issues.
> Considering the dependency issues mentioned in the thread.
> Additionally, a Proof of Concept (POC) has been implemented for a
> DynamicSink for Redshift[3]. This showcases that utilizing
> flink-connector-jdbc for Redshift doesn't provide significant benefits.
> The JDBC mode offers more flexibility by allowing direct use of the JDBC
> driver, enabling Flink connector Redshift to evolve independently.
>
> [1] -
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-307%3A++Flink+Connector+Redshift
> [2] -
>
> https://docs.aws.amazon.com/redshift/latest/mgmt/jdbc20-download-driver.html
> [3] -
> https://github.com/Samrat002/flink-connector-aws/tree/redshift-connector
>
> Bests,
> Samrat
>
>
>
>
> On Sat, Sep 16, 2023 at 10:46 AM Samrat Deb <decordea...@gmail.com> wrote:
>
> > Hello Martijn,
> >
> > I apologize for the delay in responding.
> >
> > Regarding your question about integrating Redshift directly into the JDBC
> > connector, we are planning to offer two modes: JDBC and UNLOAD. Through
> our
> > internal benchmarking, we have observed good performance in the UNLOAD
> > flow. Additionally, there is a need for both flows based on different
> user
> > use cases.
> >
> > If we were to explicitly add the JDBC mode to the flink-connector-jdbc,
> we
> > would have two options:
> >
> > 1. Include flink-connector-redshift in flink-connector-jdbc: This would
> > involve incorporating the Redshift connector into the JDBC connector.
> Since
> > Redshift is an AWS proprietary product, some authentication utilities can
> > be utilized from flink-connector-aws-base. If additional utilities are
> > required from the Redshift connector, they could be added to
> > flink-connector-aws-base. In my opinion, this approach is favorable as it
> > keeps everything related to AWS in flink-connector-aws.
> >
> > 2. Implement JDBC mode for Redshift sink in flink-connector-jdbc and
> > UNLOAD in flink-connector-aws: This alternative is not advisable as it
> > could lead to maintenance challenges and complexities.
> >
> >
> > Furthermore, it's important to highlight that Redshift has its own
> > customized JDBC driver[1], specifically optimized for compatibility with
> > Redshift. While I cannot confirm this definitively, there is a
> possibility
> > that the Redshift JDBC driver [1] might have differences in compatibility
> > when compared to the JDBC driver used in flink-connector-jdbc. This
> > suggests that if flink-connector-redshift were to rely on the JDBC
> > connector, it could potentially lead to future compatibility issues.
> >
> > Given these considerations, it seems prudent to maintain the
> > Redshift-related functionality within flink-connector-aws and keep the
> > Redshift connector independent of the JDBC connector. This approach can
> > help ensure that the Redshift connector remains flexible and adaptable to
> > any potential changes in JDBC compatibility.
> >
> > I will update the FLIP[2] to remove dependencies on flink-connector-jdbc.
> >
> > [1]
> >
> https://docs.aws.amazon.com/redshift/latest/mgmt/jdbc20-download-driver.html
> > [2]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-307%3A++Flink+Connector+Redshift
> >
> > Bests,
> > Samrat
> >
> >
> > On Mon, Sep 11, 2023 at 1:21 PM Martijn Visser <martijnvis...@apache.org
> >
> > wrote:
> >
> >> Hi Samrat,
> >>
> >> I'm still having doubts about the dependency on the JDBC connector.
> When a
> >> user specifies 'read mode', it will use the JDBC connector under the
> hood.
> >> Why not integrate Redshift then directly in the JDBC connector itself?
> It
> >> removes the need for a dependency on the JDBC driver, especially keeping
> >> in
> >> mind that this driver uses the old SourceFunction/SinkFunction
> interfaces
> >> because it hasn't been migrated yet.
> >>
> >> Best regards,
> >>
> >> Martijn
> >>
> >> On Mon, Sep 11, 2023 at 8:54 AM Samrat Deb <decordea...@gmail.com>
> wrote:
> >>
> >> > Hi Leonard,
> >> >
> >> > > Do we have to rely on the latest version of JDBC Connector here?
> >> >
> >> > No, there's no need for us to depend on the latest version of the JDBC
> >> > Connector. Redshift has its dedicated JDBC driver [1], which includes
> >> > custom modifications tailored to Redshift's specific implementation
> >> needs.
> >> > This driver is the most suitable choice for our purposes.
> >> >
> >> >
> >> > > Could you collect the APIs that Redshift generally needs to use?
> >> >
> >> > I am actively working on it and making progress towards creating the
> >> POC.
> >> >
> >> > Bests,
> >> > Samrat
> >> >
> >> > [1]
> >> >
> >> >
> >>
> https://docs.aws.amazon.com/redshift/latest/mgmt/jdbc20-download-driver.html
> >> >
> >> > On Mon, Sep 11, 2023 at 12:02 PM Samrat Deb <decordea...@gmail.com>
> >> wrote:
> >> >
> >> > > Hello Danny,
> >> > >
> >> > > I wanted to express my gratitude for your valuable feedback and
> >> > insightful
> >> > > suggestions.
> >> > >
> >> > > I will be revising the FLIP to incorporate all of your queries and
> >> review
> >> > > suggestions. Additionally, I plan to provide a Proof of Concept
> (POC)
> >> for
> >> > > the connector by the end of this week. This POC will address the
> >> points
> >> > > you've raised and ensure that the FLIP aligns with your
> >> recommendations.
> >> > >
> >> > > Thank you once again for your input.
> >> > >
> >> > > Bests,
> >> > > Samrat
> >> > >
> >> > > On Thu, Sep 7, 2023 at 10:21 PM Danny Cranmer <
> >> dannycran...@apache.org>
> >> > > wrote:
> >> > >
> >> > >> Hello Leonard,
> >> > >>
> >> > >> > Do we have to rely on the latest version of JDBC Connector here?
> I
> >> > >> understand that as long as the version of flink minor is the same
> as
> >> the
> >> > >> JDBC Connector, Could you collect the APIs that Redshift generally
> >> needs
> >> > >> to
> >> > >> use?
> >> > >>
> >> > >> I agree we do not necessarily need to rely on the latest patch
> >> version,
> >> > >> only the same minor. The main issue for me is the dependency
> >> introduces
> >> > a
> >> > >> blocker following a new Flink version. For example, when Flink
> >> 1.18.0 is
> >> > >> released we cannot release the AWS connectors until the JDBC is
> >> > complete.
> >> > >> But I think this is a good tradeoff.
> >> > >>
> >> > >> > Splitting a separate redshift repository does not solve this
> >> coupling
> >> > >> problem
> >> > >>
> >> > >> Arguably it solves the AWS<>JDBC coupling problem, but creates a
> new,
> >> > more
> >> > >> complex one!
> >> > >>
> >> > >> Thanks,
> >> > >>
> >> > >> On Thu, Sep 7, 2023 at 5:26 PM Leonard Xu <xbjt...@gmail.com>
> wrote:
> >> > >>
> >> > >> > Thanks Samrat and  Danny for driving this FLIP.
> >> > >> >
> >> > >> > >> an effective approach is to utilize the latest version of
> >> > >> > flink-connector-jdbc
> >> > >> > > as a Maven dependency
> >> > >> > >
> >> > >> > > When we have stable source/sink APIs and the connector versions
> >> are
> >> > >> > > decoupled from Flink this makes sense. But right now this would
> >> mean
> >> > >> that
> >> > >> > > the JDBC connector will block the AWS connector for each new
> >> Flink
> >> > >> > version
> >> > >> > > support release (1.18, 1.19, 1.20, 2.0 etc). That being said, I
> >> > cannot
> >> > >> > > think of a cleaner alternative, without pulling the core JDBC
> >> bits
> >> > out
> >> > >> > into
> >> > >> > > a dedicated project that is decoupled from and released
> >> > independently
> >> > >> of
> >> > >> > > Flink. Splitting flink-connector-redshift into a dedicated repo
> >> > would
> >> > >> > > decouple AWS/JDBC, but obviously introduce a new connector that
> >> is
> >> > >> > blocked
> >> > >> > > by both AWS and JDBC.
> >> > >> >
> >> > >> > Do we have to rely on the latest version of JDBC Connector here?
> I
> >> > >> > understand that as long as the version of flink minor is the same
> >> as
> >> > the
> >> > >> > JDBC Connector, Could you collect the APIs that Redshift
> generally
> >> > >> needs to
> >> > >> > use?
> >> > >> >
> >> > >> > Assuming that AWS Connector(Redshift) depends on JDBC Connector
> and
> >> > >> wants
> >> > >> > a higher version of JDBC Connector, I understand that the correct
> >> > >> approach
> >> > >> > is to promote the release of JDBC Connector and looks like we
> have
> >> no
> >> > >> more
> >> > >> > options.
> >> > >> >
> >> > >> > Splitting a separate redshift repository does not solve this
> >> coupling
> >> > >> > problem, from a user perspective, redshift should also be in the
> >> AWS
> >> > >> > Connector repo.
> >> > >> >
> >> > >> > Best,
> >> > >> > Leonard
> >> > >>
> >> > >
> >> >
> >>
> >
>

Re: [DISCUSS] FLIP-307: Flink connector Redshift

Reply via email to