Re: [DISCUSS] FLIP-307: Flink connector Redshift

Samrat Deb Fri, 15 Sep 2023 22:17:28 -0700

Hello Martijn,

I apologize for the delay in responding.


Regarding your question about integrating Redshift directly into the JDBC
connector, we are planning to offer two modes: JDBC and UNLOAD. Through our
internal benchmarking, we have observed good performance in the UNLOAD
flow. Additionally, there is a need for both flows based on different user
use cases.

If we were to explicitly add the JDBC mode to the flink-connector-jdbc, we
would have two options:

1. Include flink-connector-redshift in flink-connector-jdbc: This would
involve incorporating the Redshift connector into the JDBC connector. Since
Redshift is an AWS proprietary product, some authentication utilities can
be utilized from flink-connector-aws-base. If additional utilities are
required from the Redshift connector, they could be added to
flink-connector-aws-base. In my opinion, this approach is favorable as it
keeps everything related to AWS in flink-connector-aws.

2. Implement JDBC mode for Redshift sink in flink-connector-jdbc and UNLOAD
in flink-connector-aws: This alternative is not advisable as it could lead
to maintenance challenges and complexities.


Furthermore, it's important to highlight that Redshift has its own
customized JDBC driver[1], specifically optimized for compatibility with
Redshift. While I cannot confirm this definitively, there is a possibility
that the Redshift JDBC driver [1] might have differences in compatibility
when compared to the JDBC driver used in flink-connector-jdbc. This
suggests that if flink-connector-redshift were to rely on the JDBC
connector, it could potentially lead to future compatibility issues.

Given these considerations, it seems prudent to maintain the
Redshift-related functionality within flink-connector-aws and keep the
Redshift connector independent of the JDBC connector. This approach can
help ensure that the Redshift connector remains flexible and adaptable to
any potential changes in JDBC compatibility.

I will update the FLIP[2] to remove dependencies on flink-connector-jdbc.

[1]
https://docs.aws.amazon.com/redshift/latest/mgmt/jdbc20-download-driver.html
[2]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-307%3A++Flink+Connector+Redshift

Bests,
Samrat


On Mon, Sep 11, 2023 at 1:21 PM Martijn Visser <martijnvis...@apache.org>
wrote:

> Hi Samrat,
>
> I'm still having doubts about the dependency on the JDBC connector. When a
> user specifies 'read mode', it will use the JDBC connector under the hood.
> Why not integrate Redshift then directly in the JDBC connector itself? It
> removes the need for a dependency on the JDBC driver, especially keeping in
> mind that this driver uses the old SourceFunction/SinkFunction interfaces
> because it hasn't been migrated yet.
>
> Best regards,
>
> Martijn
>
> On Mon, Sep 11, 2023 at 8:54 AM Samrat Deb <decordea...@gmail.com> wrote:
>
> > Hi Leonard,
> >
> > > Do we have to rely on the latest version of JDBC Connector here?
> >
> > No, there's no need for us to depend on the latest version of the JDBC
> > Connector. Redshift has its dedicated JDBC driver [1], which includes
> > custom modifications tailored to Redshift's specific implementation
> needs.
> > This driver is the most suitable choice for our purposes.
> >
> >
> > > Could you collect the APIs that Redshift generally needs to use?
> >
> > I am actively working on it and making progress towards creating the POC.
> >
> > Bests,
> > Samrat
> >
> > [1]
> >
> >
> https://docs.aws.amazon.com/redshift/latest/mgmt/jdbc20-download-driver.html
> >
> > On Mon, Sep 11, 2023 at 12:02 PM Samrat Deb <decordea...@gmail.com>
> wrote:
> >
> > > Hello Danny,
> > >
> > > I wanted to express my gratitude for your valuable feedback and
> > insightful
> > > suggestions.
> > >
> > > I will be revising the FLIP to incorporate all of your queries and
> review
> > > suggestions. Additionally, I plan to provide a Proof of Concept (POC)
> for
> > > the connector by the end of this week. This POC will address the points
> > > you've raised and ensure that the FLIP aligns with your
> recommendations.
> > >
> > > Thank you once again for your input.
> > >
> > > Bests,
> > > Samrat
> > >
> > > On Thu, Sep 7, 2023 at 10:21 PM Danny Cranmer <dannycran...@apache.org
> >
> > > wrote:
> > >
> > >> Hello Leonard,
> > >>
> > >> > Do we have to rely on the latest version of JDBC Connector here? I
> > >> understand that as long as the version of flink minor is the same as
> the
> > >> JDBC Connector, Could you collect the APIs that Redshift generally
> needs
> > >> to
> > >> use?
> > >>
> > >> I agree we do not necessarily need to rely on the latest patch
> version,
> > >> only the same minor. The main issue for me is the dependency
> introduces
> > a
> > >> blocker following a new Flink version. For example, when Flink 1.18.0
> is
> > >> released we cannot release the AWS connectors until the JDBC is
> > complete.
> > >> But I think this is a good tradeoff.
> > >>
> > >> > Splitting a separate redshift repository does not solve this
> coupling
> > >> problem
> > >>
> > >> Arguably it solves the AWS<>JDBC coupling problem, but creates a new,
> > more
> > >> complex one!
> > >>
> > >> Thanks,
> > >>
> > >> On Thu, Sep 7, 2023 at 5:26 PM Leonard Xu <xbjt...@gmail.com> wrote:
> > >>
> > >> > Thanks Samrat and  Danny for driving this FLIP.
> > >> >
> > >> > >> an effective approach is to utilize the latest version of
> > >> > flink-connector-jdbc
> > >> > > as a Maven dependency
> > >> > >
> > >> > > When we have stable source/sink APIs and the connector versions
> are
> > >> > > decoupled from Flink this makes sense. But right now this would
> mean
> > >> that
> > >> > > the JDBC connector will block the AWS connector for each new Flink
> > >> > version
> > >> > > support release (1.18, 1.19, 1.20, 2.0 etc). That being said, I
> > cannot
> > >> > > think of a cleaner alternative, without pulling the core JDBC bits
> > out
> > >> > into
> > >> > > a dedicated project that is decoupled from and released
> > independently
> > >> of
> > >> > > Flink. Splitting flink-connector-redshift into a dedicated repo
> > would
> > >> > > decouple AWS/JDBC, but obviously introduce a new connector that is
> > >> > blocked
> > >> > > by both AWS and JDBC.
> > >> >
> > >> > Do we have to rely on the latest version of JDBC Connector here? I
> > >> > understand that as long as the version of flink minor is the same as
> > the
> > >> > JDBC Connector, Could you collect the APIs that Redshift generally
> > >> needs to
> > >> > use?
> > >> >
> > >> > Assuming that AWS Connector(Redshift) depends on JDBC Connector and
> > >> wants
> > >> > a higher version of JDBC Connector, I understand that the correct
> > >> approach
> > >> > is to promote the release of JDBC Connector and looks like we have
> no
> > >> more
> > >> > options.
> > >> >
> > >> > Splitting a separate redshift repository does not solve this
> coupling
> > >> > problem, from a user perspective, redshift should also be in the AWS
> > >> > Connector repo.
> > >> >
> > >> > Best,
> > >> > Leonard
> > >>
> > >
> >
>

Re: [DISCUSS] FLIP-307: Flink connector Redshift

Reply via email to