Hi Vinoth,

Sry to miss these, busy with on-call issues for the last couple of weeks.

Will create a ticket for tracking this , I will be actively working on this.

On Wed, 16 Oct 2019, 07:01 Vinoth Chandar, <[email protected]> wrote:

> Just wanted to bump this thread and see if anyone is actively working on
> kinesis support
>
> On Mon, Sep 23, 2019 at 11:51 AM Vinoth Chandar <[email protected]> wrote:
>
> > I think we are on the same page. Thanks for clarifying!
> > Note on implementation: it would be great if we can reuse the spark
> > streaming connector already present
> > https://spark.apache.org/docs/2.4.0/streaming-kinesis-integration.html
> >
> > (just like dfs, kafka and jdbc connector plans, that way we get a lot for
> > freex) ..
> >
> > On Mon, Sep 23, 2019 at 11:13 AM Vinay Patil <[email protected]>
> > wrote:
> >
> >> Hi Vinoth,
> >>
> >> I have provided the answers to your questions.
> >>
> >> > *should we just integrate to Kinesis? If DynamoDB will pump its
> >> changes into Kinesis*
> >> *anyway, why should we aware of DynanoDB directly?*
> >>
> >> - Yes, we should first integrate with Kinesis. As I mentioned once the
> >> stream is Enabled on DynamoDb table , the CDC data can be accessed from
> the
> >> shards in real time. So adding support for DynamoDb streams will be a
> >> subtask of Kinesis.
> >>
> >> > If DynamoDB will pump its changes into Kinesis anyway, why should we
> >> aware of DynamoDB directly?
> >> - Yes, we don't need to talk to DynamoDB table directly but with the
> >> streams enabled on it [1]
> >>
> >> > does kinesis streams have schemas mapped from DynamoDB already or
> >> should we be implementing a DynamoDBSchemaProvider as well?
> >>
> >> -  IMO, we don't need to be aware about the schema here, we will be
> >> getting only the CDC data in this stream[1] and the schema can be
> different
> >> for each record ( adding or removing a column)
> >>
> >> 1.
> >>
> https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.html
> >>
> >> Regards,
> >> Vinay Patil
> >>
> >>
> >>
> >> On Sun, Sep 22, 2019 at 6:28 PM Vinoth Chandar <[email protected]>
> wrote:
> >>
> >>> +1 For now we can keep this in hudi-utilities itself IMO.
> >>>
> >>> As for the connector or Deltastreamer Source to be specific, should we
> >>> just
> >>> integrate to Kinesis? If DynamoDB will pump its changes into Kinesis
> >>> anyway, why should we aware of DynanoDB directly?
> >>> Also we may need to rethink how we are going to maintain the schema?
> does
> >>> kinesis streams have schemas mapped from DynamoDB already or should we
> be
> >>> implementing a DynamoDBSchemaProvider as well?
> >>>
> >>> This would be a really great addition. But also can see how challenging
> >>> it
> >>> can be (which is fun :))
> >>>
> >>> On Sun, Sep 22, 2019 at 4:09 AM Taher Koitawala <[email protected]>
> >>> wrote:
> >>>
> >>> > I think this will be a good opportunity to plan better in terms of
> >>> > abstraction too which is needed for the Flink and Beam engines we
> might
> >>> > use.
> >>> >
> >>> > Regards,
> >>> > Taher Koitawala
> >>> >
> >>> > On Sun, Sep 22, 2019, 3:37 PM leesf <[email protected]> wrote:
> >>> >
> >>> > > +1.
> >>> > > Happy to see DeltaStreamer becomes more and more powerful. Also, we
> >>> need
> >>> > to
> >>> > > pay some attention to the layout and organization of these
> >>> connectors as
> >>> > > more and more data sources introduced to HUDI like vinoyang
> >>> suggested.
> >>> > >
> >>> > > Best,
> >>> > > Leesf
> >>> > >
> >>> > > Bhavani Sudha Saktheeswaran <[email protected]>
> >>> 于2019年9月22日周日
> >>> > > 下午12:18写道:
> >>> > >
> >>> > > > +1 to adding more connectors to DeltStreamer and making them as
> >>> much
> >>> > > > pluggable modules as possible like Vino Yang suggested.
> >>> > > >
> >>> > > >
> >>> > > > On Sat, Sep 21, 2019 at 7:12 PM vino yang <[email protected]
> >
> >>> > wrote:
> >>> > > >
> >>> > > > > + 1 to introduce these connectors. It's nice to see that Hudi's
> >>> > > ecosystem
> >>> > > > > is growing. As Hudi connects to more and more systems, it is
> >>> > necessary
> >>> > > to
> >>> > > > > introduce separate modules to place these connectors. This can
> >>> lead
> >>> > to
> >>> > > > > module relayout or code refactoring. Of course, all this needs
> >>> to be
> >>> > > > > discussed in more depth. Best, Vino On 09/21/2019 18:59, Vinay
> >>> Patil
> >>> > > > wrote:
> >>> > > > > Hi Taher, Basically this can be proposal to support Kinesis and
> >>> > > DynamoDb
> >>> > > > > stream support can be enabled by reusing this source code.
> Flink
> >>> has
> >>> > > > > provided support for DynamoDb Streams by reusing Kinesis
> Streams
> >>> > > classes.
> >>> > > > > Regards, Vinay Patil On Sat, Sep 21, 2019 at 4:26 PM Taher
> >>> Koitawala
> >>> > <
> >>> > > > > [email protected]> wrote: > That would be a great addition
> >>> Vinay.
> >>> > How
> >>> > > > > about adding Kinesis as well? > > Regards, > Taher Koitawala >
> >
> >>> On
> >>> > > Sat,
> >>> > > > > Sep 21, 2019, 4:20 PM Vinay Patil <[email protected]>
> >>> wrote: >
> >>> > > > >
> >>> > > > > Hi Team, > > > > The DynamoDb streams contains the CDC data
> when
> >>> > > enabled
> >>> > > > on
> >>> > > > > a DynamoDb > > table, we can add a source for DeltaStreamer
> which
> >>> > will
> >>> > > > > enable us to read > > this data and write it back either to
> Hudi
> >>> > > dataset
> >>> > > > or
> >>> > > > > to another sink. > > > > > > Thoughts on adding this support in
> >>> Hudi
> >>> > ?
> >>> > > >
> >>> > > > >
> >>> > > > > > > > > Regards, > > Vinay Patil > > >
> >>> > > >
> >>> > >
> >>> >
> >>>
> >>
>

Reply via email to