Re: [DISCUSS] FLIP-307: Flink connector Redshift

Samrat Deb Tue, 06 Jun 2023 11:16:31 -0700

Hi Jing,

>  I would suggest adding that information into the
FLIP.


Updated now, please review the new version of flip whenever time.

> +1 Looking forward to your PR :-)
I will request for your review once m ready with PR :-)

Bests,
Samrat

On Tue, Jun 6, 2023 at 11:43 PM Samrat Deb <decordea...@gmail.com> wrote:

> Hi Martijn,
>
> > If I understand this correctly, the Redshift sink
> would not be able to support exactly-once, is that correct?
>
> As I delve deeper into the study of Redshift's capabilities, I have
> discovered that it does support "merge into" operations [1] and some
> merge into examples [2].
> This opens up the possibility of implementing exactly-once semantics with
> the connector.
> However, I believe it would be prudent to start with a more focused scope
> for the initial phase of implementation and defer the exact-once support
> for subsequent iterations.
>
> Before finalizing the approach, I would greatly appreciate your thoughts
> and suggestions on this matter.
> Should we prioritize the initial implementation without exactly-once
> support, or would you advise incorporating it right from the start?
> Your insights and experiences would be immensely valuable in making this
> decision.
>
>
> [1]
> https://docs.aws.amazon.com/redshift/latest/dg/t_updating-inserting-using-staging-tables-.html
> [2] https://docs.aws.amazon.com/redshift/latest/dg/merge-examples.html
>
> Bests,
> Samrat
>
> On Mon, Jun 5, 2023 at 7:09 PM Jing Ge <j...@ververica.com.invalid> wrote:
>
>> Hi Samrat,
>>
>> Thanks for the feedback. I would suggest adding that information into the
>> FLIP.
>>
>> +1 Looking forward to your PR :-)
>>
>> Best regards,
>> Jing
>>
>> On Sat, Jun 3, 2023 at 9:19 PM Samrat Deb <decordea...@gmail.com> wrote:
>>
>> > Hi Jing Ge,
>> >
>> > >>> Do you already have any prototype? I'd like to join the reviews.
>> > The prototype is in progress. I will raise the dedicated PR for review
>> soon
>> > also notify in this thread as well .
>> >
>> > >>> Will the Redshift connector provide additional features
>> > beyond the mediator/wrapper of the jdbc connector?
>> >
>> > Here are the additional features that the Flink connector for AWS
>> Redshift
>> > can provide on top of using JDBC:
>> >
>> > 1. Integration with AWS Redshift Workload Management (WLM): AWS Redshift
>> > allows you to configure WLM[1] to manage query prioritization and
>> resource
>> > allocation. The Flink connector for Redshift will be agnostic to the
>> > configured WLM and utilize it for scaling in and out for the sink. This
>> > means that the connector can leverage the WLM capabilities of Redshift
>> to
>> > optimize the execution of queries and allocate resources efficiently
>> based
>> > on your defined workload priorities.
>> >
>> > 2. Abstraction of AWS Redshift Quotas and Limits: AWS Redshift imposes
>> > certain quotas and limits[2] on various aspects such as the number of
>> > clusters, concurrent connections, queries per second, etc. The Flink
>> > connector for Redshift will provide an abstraction layer for users,
>> > allowing them to work with Redshift without having to worry about these
>> > specific limits. The connector will handle the management of connections
>> > and queries within the defined quotas and limits, abstracting away the
>> > complexity and ensuring compliance with Redshift's restrictions.
>> >
>> > These features aim to simplify the integration of Flink with AWS
>> Redshift,
>> > providing optimized resource utilization and transparent handling of
>> > Redshift-specific limitations.
>> >
>> > Bests,
>> > Samrat
>> >
>> > [1]
>> >
>> >
>> https://docs.aws.amazon.com/redshift/latest/dg/cm-c-implementing-workload-management.html
>> > [2]
>> >
>> >
>> https://docs.aws.amazon.com/redshift/latest/mgmt/amazon-redshift-limits.html
>> >
>> > On Sat, Jun 3, 2023 at 11:40 PM Samrat Deb <decordea...@gmail.com>
>> wrote:
>> >
>> > > Hi Ahmed,
>> > >
>> > > >>> please let me know If you need any collaboration regarding
>> > integration
>> > > with
>> > > AWS connectors credential providers or regarding FLIP-171 I would be
>> more
>> > > than happy to assist.
>> > >
>> > > Sure, I will reach out incase of any hands required.
>> > >
>> > >
>> > >
>> > > On Fri, Jun 2, 2023 at 6:12 PM Jing Ge <j...@ververica.com.invalid>
>> > wrote:
>> > >
>> > >> Hi Samrat,
>> > >>
>> > >> Excited to see your proposal. Supporting data warehouses is one of
>> the
>> > >> major tracks for Flink. Thanks for driving it! Happy to see that we
>> > >> reached
>> > >> consensus to prioritize the Sink over Source in the previous
>> discussion.
>> > >> Do
>> > >> you already have any prototype? I'd like to join the reviews.
>> > >>
>> > >> Just out of curiosity, speaking of JDBC mode, according to the FLIP,
>> it
>> > >> should be doable to directly use the jdbc connector with Redshift,
>> if I
>> > am
>> > >> not mistaken. Will the Redshift connector provide additional features
>> > >> beyond the mediator/wrapper of the jdbc connector?
>> > >>
>> > >> Best regards,
>> > >> Jing
>> > >>
>> > >> On Thu, Jun 1, 2023 at 8:22 PM Ahmed Hamdy <hamdy10...@gmail.com>
>> > wrote:
>> > >>
>> > >> > Hi Samrat
>> > >> >
>> > >> > Thanks for putting up this FLIP. I agree regarding the importance
>> of
>> > the
>> > >> > use case.
>> > >> > please let me know If you need any collaboration regarding
>> integration
>> > >> with
>> > >> > AWS connectors credential providers or regarding FLIP-171 I would
>> be
>> > >> more
>> > >> > than happy to assist.
>> > >> > I also like Leonard's proposal for starting with DataStreamSink and
>> > >> > TableSink, It would be great to have some milestones delivered as
>> soon
>> > >> as
>> > >> > ready.
>> > >> > best regards
>> > >> > Ahmed Hamdy
>> > >> >
>> > >> >
>> > >> > On Wed, 31 May 2023 at 11:15, Samrat Deb <decordea...@gmail.com>
>> > wrote:
>> > >> >
>> > >> > > Hi Liu Ron,
>> > >> > >
>> > >> > > > 1. Regarding the  `read.mode` and `write.mode`, you say here
>> > >> provides
>> > >> > two
>> > >> > > modes, respectively, jdbc and `unload or copy`, What is the
>> default
>> > >> value
>> > >> > > for `read.mode` and `write.mode?
>> > >> > >
>> > >> > > I have made an effort to make the configuration options
>> `read.mode`
>> > >> and
>> > >> > > `write.mode` mandatory for the "flink-connector-redshift"
>> according
>> > to
>> > >> > > FLIP[1]. The rationale behind this decision is to empower users
>> who
>> > >> are
>> > >> > > familiar with their Redshift setup and have specific expectations
>> > for
>> > >> the
>> > >> > > sink. By making these configurations mandatory, users can have
>> more
>> > >> > control
>> > >> > > and flexibility in configuring the connector to meet their
>> > >> requirements.
>> > >> > >
>> > >> > > However, I am open to receiving feedback on whether it would be
>> > >> > beneficial
>> > >> > > to make the configuration options non-mandatory and set default
>> > values
>> > >> > for
>> > >> > > them. If you believe there are advantages to having default
>> values
>> > or
>> > >> any
>> > >> > > other suggestions, please share your thoughts. Your feedback is
>> > highly
>> > >> > > appreciated.
>> > >> > >
>> > >> > > >  2. For Source, does it both support batch read and streaming
>> > read?
>> > >> > >
>> > >> > > Redshift currently does not provide native support for streaming
>> > >> reads,
>> > >> > > although it does support streaming writes[2]. As part of the
>> plan, I
>> > >> > intend
>> > >> > > to conduct a proof of concept and benchmarking to explore the
>> > >> > possibilities
>> > >> > > of implementing streaming reads using the Flink JDBC connector,
>> as
>> > >> > Redshift
>> > >> > > is JDBC compatible.
>> > >> > > However, it is important to note that, in the initial phase of
>> > >> > > implementation, the focus will primarily be on supporting batch
>> > reads
>> > >> > > rather than streaming reads. This approach will allow us to
>> deliver
>> > a
>> > >> > > robust and reliable solution for batch processing in phase 2 of
>> the
>> > >> > > implementation.
>> > >> > >
>> > >> > > [1]
>> > >> > >
>> > >> > >
>> > >> >
>> > >>
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-307%3A++Flink+Connector+Redshift
>> > >> > > [2]
>> > >> > >
>> > >> > >
>> > >> >
>> > >>
>> >
>> https://docs.aws.amazon.com/redshift/latest/dg/materialized-view-streaming-ingestion.html
>> > >> > >
>> > >> > > Bests,
>> > >> > > Samrat
>> > >> > >
>> > >> > > On Wed, May 31, 2023 at 8:03 AM liu ron <ron9....@gmail.com>
>> wrote:
>> > >> > >
>> > >> > > > Hi, Samrat
>> > >> > > >
>> > >> > > > Thanks for driving this FLIP. It looks like supporting
>> > >> > > > flink-connector-redshift is very useful to Flink. I have two
>> > >> question:
>> > >> > > > 1. Regarding the  `read.mode` and `write.mode`, you say here
>> > >> provides
>> > >> > two
>> > >> > > > modes, respectively, jdbc and `unload or copy`, What is the
>> > default
>> > >> > value
>> > >> > > > for `read.mode` and `write.mode?
>> > >> > > > 2. For Source, does it both support batch read and streaming
>> read?
>> > >> > > >
>> > >> > > >
>> > >> > > > Best,
>> > >> > > > Ron
>> > >> > > >
>> > >> > > > Samrat Deb <decordea...@gmail.com> 于2023年5月30日周二 17:15写道：
>> > >> > > >
>> > >> > > > > [1]
>> > >> > > > >
>> > >> > > > >
>> > >> > > >
>> > >> > >
>> > >> >
>> > >>
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-307%3A++Flink+Connector+Redshift
>> > >> > > > >
>> > >> > > > > [note] Missed the trailing link for previous mail
>> > >> > > > >
>> > >> > > > >
>> > >> > > > >
>> > >> > > > > On Tue, May 30, 2023 at 2:43 PM Samrat Deb <
>> > decordea...@gmail.com
>> > >> >
>> > >> > > > wrote:
>> > >> > > > >
>> > >> > > > > > Hi Leonard,
>> > >> > > > > >
>> > >> > > > > > > and I’m glad to help review the design as well as the
>> code
>> > >> > review.
>> > >> > > > > > Thank you so much. It would be really great and helpful to
>> > bring
>> > >> > > > > > flink-connector-redshift for flink users :) .
>> > >> > > > > >
>> > >> > > > > > I have divided the implementation in 3 phases in the
>> `Scope`
>> > >> > > > Section[1].
>> > >> > > > > > 1st phase is to
>> > >> > > > > >
>> > >> > > > > >    - Integrate with Flink Sink API (*FLIP-171*
>> > >> > > > > >    <
>> > >> > > > >
>> > >> > >
>> > >>
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-171%3A+Async+Sink
>> > >> > > > >
>> > >> > > > > >    )
>> > >> > > > > >
>> > >> > > > > >
>> > >> > > > > > > About the implementation phases, How about prioritizing
>> > >> support
>> > >> > for
>> > >> > > > the
>> > >> > > > > > Datastream Sink API and TableSink API in the first phase?
>> > >> > > > > > I can completely resonate with you to prioritize support
>> for
>> > >> > > Datastream
>> > >> > > > > > Sink API and TableSink API in the first phase.
>> > >> > > > > > I will update the FLIP[1] as you have suggested.
>> > >> > > > > >
>> > >> > > > > > > It seems that the primary use cases for the Redshift
>> > connector
>> > >> > are
>> > >> > > > > > acting as a sink for processed data by Flink.
>> > >> > > > > > Yes, majority ask and requirement for Redshift connector is
>> > sink
>> > >> > for
>> > >> > > > > > processed data by Flink.
>> > >> > > > > >
>> > >> > > > > > Bests,
>> > >> > > > > > Samrat
>> > >> > > > > >
>> > >> > > > > > On Tue, May 30, 2023 at 12:35 PM Leonard Xu <
>> > xbjt...@gmail.com>
>> > >> > > wrote:
>> > >> > > > > >
>> > >> > > > > >> Thanks @Samrat for bringing this discussion.
>> > >> > > > > >>
>> > >> > > > > >> It makes sense to me to introduce AWS Redshift connector
>> for
>> > >> > Apache
>> > >> > > > > >> Flink, and I’m glad to help review the design as well as
>> the
>> > >> code
>> > >> > > > > review.
>> > >> > > > > >>
>> > >> > > > > >> About the implementation phases, How about prioritizing
>> > support
>> > >> > for
>> > >> > > > the
>> > >> > > > > >> Datastream Sink API and TableSink API in the first phase?
>> It
>> > >> seems
>> > >> > > > that
>> > >> > > > > the
>> > >> > > > > >> primary use cases for the Redshift connector are acting
>> as a
>> > >> sink
>> > >> > > for
>> > >> > > > > >> processed data by Flink.
>> > >> > > > > >>
>> > >> > > > > >> Best,
>> > >> > > > > >> Leonard
>> > >> > > > > >>
>> > >> > > > > >>
>> > >> > > > > >> > On May 29, 2023, at 12:51 PM, Samrat Deb <
>> > >> decordea...@gmail.com
>> > >> > >
>> > >> > > > > wrote:
>> > >> > > > > >> >
>> > >> > > > > >> > Hello all ,
>> > >> > > > > >> >
>> > >> > > > > >> > Context:
>> > >> > > > > >> > Amazon Redshift [1] is a fully managed, petabyte-scale
>> data
>> > >> > > > warehouse
>> > >> > > > > >> > service in the cloud. It allows analyzing data without
>> all
>> > of
>> > >> > the
>> > >> > > > > >> > configurations of a provisioned data warehouse.
>> Resources
>> > are
>> > >> > > > > >> automatically
>> > >> > > > > >> > provisioned and data warehouse capacity is intelligently
>> > >> scaled
>> > >> > to
>> > >> > > > > >> deliver
>> > >> > > > > >> > fast performance for even the most demanding and
>> > >> unpredictable
>> > >> > > > > >> workloads.
>> > >> > > > > >> > Redshift is one of the widely used warehouse solutions
>> in
>> > the
>> > >> > > > current
>> > >> > > > > >> > market.
>> > >> > > > > >> >
>> > >> > > > > >> > Building flink connector redshift will allow flink
>> users to
>> > >> have
>> > >> > > > > source
>> > >> > > > > >> and
>> > >> > > > > >> > sink directly to redshift. It will help flink to expand
>> the
>> > >> > scope
>> > >> > > to
>> > >> > > > > >> > redshift as a new connector in the ecosystem.
>> > >> > > > > >> >
>> > >> > > > > >> > I would like to start a discussion on the FLIP-307:
>> Flink
>> > >> > > connector
>> > >> > > > > >> > redshift [2].
>> > >> > > > > >> > Looking forward to comments, feedbacks and suggestions
>> from
>> > >> the
>> > >> > > > > >> community
>> > >> > > > > >> > on the proposal.
>> > >> > > > > >> >
>> > >> > > > > >> > [1]
>> > >> > https://docs.aws.amazon.com/redshift/latest/mgmt/welcome.html
>> > >> > > > > >> > [2]
>> > >> > > > > >> >
>> > >> > > > > >>
>> > >> > > > >
>> > >> > > >
>> > >> > >
>> > >> >
>> > >>
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-307%3A++Flink+Connector+Redshift
>> > >> > > > > >> >
>> > >> > > > > >> >
>> > >> > > > > >> >
>> > >> > > > > >> > Bests,
>> > >> > > > > >> > Samrat
>> > >> > > > > >>
>> > >> > > > > >>
>> > >> > > > >
>> > >> > > >
>> > >> > >
>> > >> >
>> > >>
>> > >
>> >
>>
>

Re: [DISCUSS] FLIP-307: Flink connector Redshift

Reply via email to