Re: [DISCUSS] FLIP-307: Flink connector Redshift

Samrat Deb Sat, 16 Sep 2023 08:29:31 -0700

Hi ,

I've made updates to the FLIP[1] by incorporating relevant changes to avoid
using the Flink connector JDBC. This decision was based on the following
reasons:


AWS Redshift utilizes its specialized JDBC driver[2]. Given that their JDBC
driver may undergo evolutions over time, using the Flink connector JDBC
might face compatibility issues.
Considering the dependency issues mentioned in the thread.
Additionally, a Proof of Concept (POC) has been implemented for a
DynamicSink for Redshift[3]. This showcases that utilizing
flink-connector-jdbc for Redshift doesn't provide significant benefits.
The JDBC mode offers more flexibility by allowing direct use of the JDBC
driver, enabling Flink connector Redshift to evolve independently.

[1] -
https://cwiki.apache.org/confluence/display/FLINK/FLIP-307%3A++Flink+Connector+Redshift
[2] -
https://docs.aws.amazon.com/redshift/latest/mgmt/jdbc20-download-driver.html
[3] -
https://github.com/Samrat002/flink-connector-aws/tree/redshift-connector

Bests,
Samrat




On Sat, Sep 16, 2023 at 10:46 AM Samrat Deb <decordea...@gmail.com> wrote:

> Hello Martijn,
>
> I apologize for the delay in responding.
>
> Regarding your question about integrating Redshift directly into the JDBC
> connector, we are planning to offer two modes: JDBC and UNLOAD. Through our
> internal benchmarking, we have observed good performance in the UNLOAD
> flow. Additionally, there is a need for both flows based on different user
> use cases.
>
> If we were to explicitly add the JDBC mode to the flink-connector-jdbc, we
> would have two options:
>
> 1. Include flink-connector-redshift in flink-connector-jdbc: This would
> involve incorporating the Redshift connector into the JDBC connector. Since
> Redshift is an AWS proprietary product, some authentication utilities can
> be utilized from flink-connector-aws-base. If additional utilities are
> required from the Redshift connector, they could be added to
> flink-connector-aws-base. In my opinion, this approach is favorable as it
> keeps everything related to AWS in flink-connector-aws.
>
> 2. Implement JDBC mode for Redshift sink in flink-connector-jdbc and
> UNLOAD in flink-connector-aws: This alternative is not advisable as it
> could lead to maintenance challenges and complexities.
>
>
> Furthermore, it's important to highlight that Redshift has its own
> customized JDBC driver[1], specifically optimized for compatibility with
> Redshift. While I cannot confirm this definitively, there is a possibility
> that the Redshift JDBC driver [1] might have differences in compatibility
> when compared to the JDBC driver used in flink-connector-jdbc. This
> suggests that if flink-connector-redshift were to rely on the JDBC
> connector, it could potentially lead to future compatibility issues.
>
> Given these considerations, it seems prudent to maintain the
> Redshift-related functionality within flink-connector-aws and keep the
> Redshift connector independent of the JDBC connector. This approach can
> help ensure that the Redshift connector remains flexible and adaptable to
> any potential changes in JDBC compatibility.
>
> I will update the FLIP[2] to remove dependencies on flink-connector-jdbc.
>
> [1]
> https://docs.aws.amazon.com/redshift/latest/mgmt/jdbc20-download-driver.html
> [2]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-307%3A++Flink+Connector+Redshift
>
> Bests,
> Samrat
>
>
> On Mon, Sep 11, 2023 at 1:21 PM Martijn Visser <martijnvis...@apache.org>
> wrote:
>
>> Hi Samrat,
>>
>> I'm still having doubts about the dependency on the JDBC connector. When a
>> user specifies 'read mode', it will use the JDBC connector under the hood.
>> Why not integrate Redshift then directly in the JDBC connector itself? It
>> removes the need for a dependency on the JDBC driver, especially keeping
>> in
>> mind that this driver uses the old SourceFunction/SinkFunction interfaces
>> because it hasn't been migrated yet.
>>
>> Best regards,
>>
>> Martijn
>>
>> On Mon, Sep 11, 2023 at 8:54 AM Samrat Deb <decordea...@gmail.com> wrote:
>>
>> > Hi Leonard,
>> >
>> > > Do we have to rely on the latest version of JDBC Connector here?
>> >
>> > No, there's no need for us to depend on the latest version of the JDBC
>> > Connector. Redshift has its dedicated JDBC driver [1], which includes
>> > custom modifications tailored to Redshift's specific implementation
>> needs.
>> > This driver is the most suitable choice for our purposes.
>> >
>> >
>> > > Could you collect the APIs that Redshift generally needs to use?
>> >
>> > I am actively working on it and making progress towards creating the
>> POC.
>> >
>> > Bests,
>> > Samrat
>> >
>> > [1]
>> >
>> >
>> https://docs.aws.amazon.com/redshift/latest/mgmt/jdbc20-download-driver.html
>> >
>> > On Mon, Sep 11, 2023 at 12:02 PM Samrat Deb <decordea...@gmail.com>
>> wrote:
>> >
>> > > Hello Danny,
>> > >
>> > > I wanted to express my gratitude for your valuable feedback and
>> > insightful
>> > > suggestions.
>> > >
>> > > I will be revising the FLIP to incorporate all of your queries and
>> review
>> > > suggestions. Additionally, I plan to provide a Proof of Concept (POC)
>> for
>> > > the connector by the end of this week. This POC will address the
>> points
>> > > you've raised and ensure that the FLIP aligns with your
>> recommendations.
>> > >
>> > > Thank you once again for your input.
>> > >
>> > > Bests,
>> > > Samrat
>> > >
>> > > On Thu, Sep 7, 2023 at 10:21 PM Danny Cranmer <
>> dannycran...@apache.org>
>> > > wrote:
>> > >
>> > >> Hello Leonard,
>> > >>
>> > >> > Do we have to rely on the latest version of JDBC Connector here? I
>> > >> understand that as long as the version of flink minor is the same as
>> the
>> > >> JDBC Connector, Could you collect the APIs that Redshift generally
>> needs
>> > >> to
>> > >> use?
>> > >>
>> > >> I agree we do not necessarily need to rely on the latest patch
>> version,
>> > >> only the same minor. The main issue for me is the dependency
>> introduces
>> > a
>> > >> blocker following a new Flink version. For example, when Flink
>> 1.18.0 is
>> > >> released we cannot release the AWS connectors until the JDBC is
>> > complete.
>> > >> But I think this is a good tradeoff.
>> > >>
>> > >> > Splitting a separate redshift repository does not solve this
>> coupling
>> > >> problem
>> > >>
>> > >> Arguably it solves the AWS<>JDBC coupling problem, but creates a new,
>> > more
>> > >> complex one!
>> > >>
>> > >> Thanks,
>> > >>
>> > >> On Thu, Sep 7, 2023 at 5:26 PM Leonard Xu <xbjt...@gmail.com> wrote:
>> > >>
>> > >> > Thanks Samrat and  Danny for driving this FLIP.
>> > >> >
>> > >> > >> an effective approach is to utilize the latest version of
>> > >> > flink-connector-jdbc
>> > >> > > as a Maven dependency
>> > >> > >
>> > >> > > When we have stable source/sink APIs and the connector versions
>> are
>> > >> > > decoupled from Flink this makes sense. But right now this would
>> mean
>> > >> that
>> > >> > > the JDBC connector will block the AWS connector for each new
>> Flink
>> > >> > version
>> > >> > > support release (1.18, 1.19, 1.20, 2.0 etc). That being said, I
>> > cannot
>> > >> > > think of a cleaner alternative, without pulling the core JDBC
>> bits
>> > out
>> > >> > into
>> > >> > > a dedicated project that is decoupled from and released
>> > independently
>> > >> of
>> > >> > > Flink. Splitting flink-connector-redshift into a dedicated repo
>> > would
>> > >> > > decouple AWS/JDBC, but obviously introduce a new connector that
>> is
>> > >> > blocked
>> > >> > > by both AWS and JDBC.
>> > >> >
>> > >> > Do we have to rely on the latest version of JDBC Connector here? I
>> > >> > understand that as long as the version of flink minor is the same
>> as
>> > the
>> > >> > JDBC Connector, Could you collect the APIs that Redshift generally
>> > >> needs to
>> > >> > use?
>> > >> >
>> > >> > Assuming that AWS Connector(Redshift) depends on JDBC Connector and
>> > >> wants
>> > >> > a higher version of JDBC Connector, I understand that the correct
>> > >> approach
>> > >> > is to promote the release of JDBC Connector and looks like we have
>> no
>> > >> more
>> > >> > options.
>> > >> >
>> > >> > Splitting a separate redshift repository does not solve this
>> coupling
>> > >> > problem, from a user perspective, redshift should also be in the
>> AWS
>> > >> > Connector repo.
>> > >> >
>> > >> > Best,
>> > >> > Leonard
>> > >>
>> > >
>> >
>>
>

Re: [DISCUSS] FLIP-307: Flink connector Redshift

Reply via email to