Hi , I've made updates to the FLIP[1] by incorporating relevant changes to avoid using the Flink connector JDBC. This decision was based on the following reasons:
AWS Redshift utilizes its specialized JDBC driver[2]. Given that their JDBC driver may undergo evolutions over time, using the Flink connector JDBC might face compatibility issues. Considering the dependency issues mentioned in the thread. Additionally, a Proof of Concept (POC) has been implemented for a DynamicSink for Redshift[3]. This showcases that utilizing flink-connector-jdbc for Redshift doesn't provide significant benefits. The JDBC mode offers more flexibility by allowing direct use of the JDBC driver, enabling Flink connector Redshift to evolve independently. [1] - https://cwiki.apache.org/confluence/display/FLINK/FLIP-307%3A++Flink+Connector+Redshift [2] - https://docs.aws.amazon.com/redshift/latest/mgmt/jdbc20-download-driver.html [3] - https://github.com/Samrat002/flink-connector-aws/tree/redshift-connector Bests, Samrat On Sat, Sep 16, 2023 at 10:46 AM Samrat Deb <decordea...@gmail.com> wrote: > Hello Martijn, > > I apologize for the delay in responding. > > Regarding your question about integrating Redshift directly into the JDBC > connector, we are planning to offer two modes: JDBC and UNLOAD. Through our > internal benchmarking, we have observed good performance in the UNLOAD > flow. Additionally, there is a need for both flows based on different user > use cases. > > If we were to explicitly add the JDBC mode to the flink-connector-jdbc, we > would have two options: > > 1. Include flink-connector-redshift in flink-connector-jdbc: This would > involve incorporating the Redshift connector into the JDBC connector. Since > Redshift is an AWS proprietary product, some authentication utilities can > be utilized from flink-connector-aws-base. If additional utilities are > required from the Redshift connector, they could be added to > flink-connector-aws-base. In my opinion, this approach is favorable as it > keeps everything related to AWS in flink-connector-aws. > > 2. Implement JDBC mode for Redshift sink in flink-connector-jdbc and > UNLOAD in flink-connector-aws: This alternative is not advisable as it > could lead to maintenance challenges and complexities. > > > Furthermore, it's important to highlight that Redshift has its own > customized JDBC driver[1], specifically optimized for compatibility with > Redshift. While I cannot confirm this definitively, there is a possibility > that the Redshift JDBC driver [1] might have differences in compatibility > when compared to the JDBC driver used in flink-connector-jdbc. This > suggests that if flink-connector-redshift were to rely on the JDBC > connector, it could potentially lead to future compatibility issues. > > Given these considerations, it seems prudent to maintain the > Redshift-related functionality within flink-connector-aws and keep the > Redshift connector independent of the JDBC connector. This approach can > help ensure that the Redshift connector remains flexible and adaptable to > any potential changes in JDBC compatibility. > > I will update the FLIP[2] to remove dependencies on flink-connector-jdbc. > > [1] > https://docs.aws.amazon.com/redshift/latest/mgmt/jdbc20-download-driver.html > [2] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-307%3A++Flink+Connector+Redshift > > Bests, > Samrat > > > On Mon, Sep 11, 2023 at 1:21 PM Martijn Visser <martijnvis...@apache.org> > wrote: > >> Hi Samrat, >> >> I'm still having doubts about the dependency on the JDBC connector. When a >> user specifies 'read mode', it will use the JDBC connector under the hood. >> Why not integrate Redshift then directly in the JDBC connector itself? It >> removes the need for a dependency on the JDBC driver, especially keeping >> in >> mind that this driver uses the old SourceFunction/SinkFunction interfaces >> because it hasn't been migrated yet. >> >> Best regards, >> >> Martijn >> >> On Mon, Sep 11, 2023 at 8:54 AM Samrat Deb <decordea...@gmail.com> wrote: >> >> > Hi Leonard, >> > >> > > Do we have to rely on the latest version of JDBC Connector here? >> > >> > No, there's no need for us to depend on the latest version of the JDBC >> > Connector. Redshift has its dedicated JDBC driver [1], which includes >> > custom modifications tailored to Redshift's specific implementation >> needs. >> > This driver is the most suitable choice for our purposes. >> > >> > >> > > Could you collect the APIs that Redshift generally needs to use? >> > >> > I am actively working on it and making progress towards creating the >> POC. >> > >> > Bests, >> > Samrat >> > >> > [1] >> > >> > >> https://docs.aws.amazon.com/redshift/latest/mgmt/jdbc20-download-driver.html >> > >> > On Mon, Sep 11, 2023 at 12:02 PM Samrat Deb <decordea...@gmail.com> >> wrote: >> > >> > > Hello Danny, >> > > >> > > I wanted to express my gratitude for your valuable feedback and >> > insightful >> > > suggestions. >> > > >> > > I will be revising the FLIP to incorporate all of your queries and >> review >> > > suggestions. Additionally, I plan to provide a Proof of Concept (POC) >> for >> > > the connector by the end of this week. This POC will address the >> points >> > > you've raised and ensure that the FLIP aligns with your >> recommendations. >> > > >> > > Thank you once again for your input. >> > > >> > > Bests, >> > > Samrat >> > > >> > > On Thu, Sep 7, 2023 at 10:21 PM Danny Cranmer < >> dannycran...@apache.org> >> > > wrote: >> > > >> > >> Hello Leonard, >> > >> >> > >> > Do we have to rely on the latest version of JDBC Connector here? I >> > >> understand that as long as the version of flink minor is the same as >> the >> > >> JDBC Connector, Could you collect the APIs that Redshift generally >> needs >> > >> to >> > >> use? >> > >> >> > >> I agree we do not necessarily need to rely on the latest patch >> version, >> > >> only the same minor. The main issue for me is the dependency >> introduces >> > a >> > >> blocker following a new Flink version. For example, when Flink >> 1.18.0 is >> > >> released we cannot release the AWS connectors until the JDBC is >> > complete. >> > >> But I think this is a good tradeoff. >> > >> >> > >> > Splitting a separate redshift repository does not solve this >> coupling >> > >> problem >> > >> >> > >> Arguably it solves the AWS<>JDBC coupling problem, but creates a new, >> > more >> > >> complex one! >> > >> >> > >> Thanks, >> > >> >> > >> On Thu, Sep 7, 2023 at 5:26 PM Leonard Xu <xbjt...@gmail.com> wrote: >> > >> >> > >> > Thanks Samrat and Danny for driving this FLIP. >> > >> > >> > >> > >> an effective approach is to utilize the latest version of >> > >> > flink-connector-jdbc >> > >> > > as a Maven dependency >> > >> > > >> > >> > > When we have stable source/sink APIs and the connector versions >> are >> > >> > > decoupled from Flink this makes sense. But right now this would >> mean >> > >> that >> > >> > > the JDBC connector will block the AWS connector for each new >> Flink >> > >> > version >> > >> > > support release (1.18, 1.19, 1.20, 2.0 etc). That being said, I >> > cannot >> > >> > > think of a cleaner alternative, without pulling the core JDBC >> bits >> > out >> > >> > into >> > >> > > a dedicated project that is decoupled from and released >> > independently >> > >> of >> > >> > > Flink. Splitting flink-connector-redshift into a dedicated repo >> > would >> > >> > > decouple AWS/JDBC, but obviously introduce a new connector that >> is >> > >> > blocked >> > >> > > by both AWS and JDBC. >> > >> > >> > >> > Do we have to rely on the latest version of JDBC Connector here? I >> > >> > understand that as long as the version of flink minor is the same >> as >> > the >> > >> > JDBC Connector, Could you collect the APIs that Redshift generally >> > >> needs to >> > >> > use? >> > >> > >> > >> > Assuming that AWS Connector(Redshift) depends on JDBC Connector and >> > >> wants >> > >> > a higher version of JDBC Connector, I understand that the correct >> > >> approach >> > >> > is to promote the release of JDBC Connector and looks like we have >> no >> > >> more >> > >> > options. >> > >> > >> > >> > Splitting a separate redshift repository does not solve this >> coupling >> > >> > problem, from a user perspective, redshift should also be in the >> AWS >> > >> > Connector repo. >> > >> > >> > >> > Best, >> > >> > Leonard >> > >> >> > > >> > >> >