Great!

On Wed, Oct 23, 2019 at 10:41 PM Vinay Patil <[email protected]>
wrote:

> Thanks a lot Vinoth for opening this jira.
>
> Will start with the initial design and share the document.
>
> Regards,
> Vinay Patil
>
>
> On Mon, Oct 21, 2019 at 9:36 PM Balaji Varadarajan
> <[email protected]> wrote:
>
> >  +1. This is a much needed and super useful feature for a lot of folks in
> > the community.
> >
> > Balaji.V     On Monday, October 21, 2019, 7:08:30 AM PDT, Vinoth Chandar
> <
> > [email protected]> wrote:
> >
> >  https://issues.apache.org/jira/browse/HUDI-310 tracks this. Love to get
> > this into the next release as much as possible :)
> >
> > On Thu, Oct 17, 2019 at 10:16 PM Vinoth Chandar <[email protected]>
> wrote:
> >
> > > No problem. Having kinesis will get us a compelling story for cloud
> data
> > > ingestion
> > >
> > > On Thu, Oct 17, 2019 at 8:38 PM Vinay Patil <[email protected]>
> > > wrote:
> > >
> > >> Hi Vinoth,
> > >>
> > >> Sry to miss these, busy with on-call issues for the last couple of
> > weeks.
> > >>
> > >> Will create a ticket for tracking this , I will be actively working on
> > >> this.
> > >>
> > >> On Wed, 16 Oct 2019, 07:01 Vinoth Chandar, <[email protected]> wrote:
> > >>
> > >> > Just wanted to bump this thread and see if anyone is actively
> working
> > on
> > >> > kinesis support
> > >> >
> > >> > On Mon, Sep 23, 2019 at 11:51 AM Vinoth Chandar <[email protected]>
> > >> wrote:
> > >> >
> > >> > > I think we are on the same page. Thanks for clarifying!
> > >> > > Note on implementation: it would be great if we can reuse the
> spark
> > >> > > streaming connector already present
> > >> > >
> > >>
> https://spark.apache.org/docs/2.4.0/streaming-kinesis-integration.html
> > >> > >
> > >> > > (just like dfs, kafka and jdbc connector plans, that way we get a
> > lot
> > >> for
> > >> > > freex) ..
> > >> > >
> > >> > > On Mon, Sep 23, 2019 at 11:13 AM Vinay Patil <
> > [email protected]
> > >> >
> > >> > > wrote:
> > >> > >
> > >> > >> Hi Vinoth,
> > >> > >>
> > >> > >> I have provided the answers to your questions.
> > >> > >>
> > >> > >> > *should we just integrate to Kinesis? If DynamoDB will pump its
> > >> > >> changes into Kinesis*
> > >> > >> *anyway, why should we aware of DynanoDB directly?*
> > >> > >>
> > >> > >> - Yes, we should first integrate with Kinesis. As I mentioned
> once
> > >> the
> > >> > >> stream is Enabled on DynamoDb table , the CDC data can be
> accessed
> > >> from
> > >> > the
> > >> > >> shards in real time. So adding support for DynamoDb streams will
> > be a
> > >> > >> subtask of Kinesis.
> > >> > >>
> > >> > >> > If DynamoDB will pump its changes into Kinesis anyway, why
> should
> > >> we
> > >> > >> aware of DynamoDB directly?
> > >> > >> - Yes, we don't need to talk to DynamoDB table directly but with
> > the
> > >> > >> streams enabled on it [1]
> > >> > >>
> > >> > >> > does kinesis streams have schemas mapped from DynamoDB already
> or
> > >> > >> should we be implementing a DynamoDBSchemaProvider as well?
> > >> > >>
> > >> > >> -  IMO, we don't need to be aware about the schema here, we will
> be
> > >> > >> getting only the CDC data in this stream[1] and the schema can be
> > >> > different
> > >> > >> for each record ( adding or removing a column)
> > >> > >>
> > >> > >> 1.
> > >> > >>
> > >> >
> > >>
> >
> https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.html
> > >> > >>
> > >> > >> Regards,
> > >> > >> Vinay Patil
> > >> > >>
> > >> > >>
> > >> > >>
> > >> > >> On Sun, Sep 22, 2019 at 6:28 PM Vinoth Chandar <
> [email protected]>
> > >> > wrote:
> > >> > >>
> > >> > >>> +1 For now we can keep this in hudi-utilities itself IMO.
> > >> > >>>
> > >> > >>> As for the connector or Deltastreamer Source to be specific,
> > should
> > >> we
> > >> > >>> just
> > >> > >>> integrate to Kinesis? If DynamoDB will pump its changes into
> > Kinesis
> > >> > >>> anyway, why should we aware of DynanoDB directly?
> > >> > >>> Also we may need to rethink how we are going to maintain the
> > schema?
> > >> > does
> > >> > >>> kinesis streams have schemas mapped from DynamoDB already or
> > should
> > >> we
> > >> > be
> > >> > >>> implementing a DynamoDBSchemaProvider as well?
> > >> > >>>
> > >> > >>> This would be a really great addition. But also can see how
> > >> challenging
> > >> > >>> it
> > >> > >>> can be (which is fun :))
> > >> > >>>
> > >> > >>> On Sun, Sep 22, 2019 at 4:09 AM Taher Koitawala <
> > [email protected]
> > >> >
> > >> > >>> wrote:
> > >> > >>>
> > >> > >>> > I think this will be a good opportunity to plan better in
> terms
> > of
> > >> > >>> > abstraction too which is needed for the Flink and Beam engines
> > we
> > >> > might
> > >> > >>> > use.
> > >> > >>> >
> > >> > >>> > Regards,
> > >> > >>> > Taher Koitawala
> > >> > >>> >
> > >> > >>> > On Sun, Sep 22, 2019, 3:37 PM leesf <[email protected]>
> > wrote:
> > >> > >>> >
> > >> > >>> > > +1.
> > >> > >>> > > Happy to see DeltaStreamer becomes more and more powerful.
> > >> Also, we
> > >> > >>> need
> > >> > >>> > to
> > >> > >>> > > pay some attention to the layout and organization of these
> > >> > >>> connectors as
> > >> > >>> > > more and more data sources introduced to HUDI like vinoyang
> > >> > >>> suggested.
> > >> > >>> > >
> > >> > >>> > > Best,
> > >> > >>> > > Leesf
> > >> > >>> > >
> > >> > >>> > > Bhavani Sudha Saktheeswaran <[email protected]>
> > >> > >>> 于2019年9月22日周日
> > >> > >>> > > 下午12:18写道:
> > >> > >>> > >
> > >> > >>> > > > +1 to adding more connectors to DeltStreamer and making
> them
> > >> as
> > >> > >>> much
> > >> > >>> > > > pluggable modules as possible like Vino Yang suggested.
> > >> > >>> > > >
> > >> > >>> > > >
> > >> > >>> > > > On Sat, Sep 21, 2019 at 7:12 PM vino yang <
> > >> [email protected]
> > >> > >
> > >> > >>> > wrote:
> > >> > >>> > > >
> > >> > >>> > > > > + 1 to introduce these connectors. It's nice to see that
> > >> Hudi's
> > >> > >>> > > ecosystem
> > >> > >>> > > > > is growing. As Hudi connects to more and more systems,
> it
> > is
> > >> > >>> > necessary
> > >> > >>> > > to
> > >> > >>> > > > > introduce separate modules to place these connectors.
> This
> > >> can
> > >> > >>> lead
> > >> > >>> > to
> > >> > >>> > > > > module relayout or code refactoring. Of course, all this
> > >> needs
> > >> > >>> to be
> > >> > >>> > > > > discussed in more depth. Best, Vino On 09/21/2019 18:59,
> > >> Vinay
> > >> > >>> Patil
> > >> > >>> > > > wrote:
> > >> > >>> > > > > Hi Taher, Basically this can be proposal to support
> > Kinesis
> > >> and
> > >> > >>> > > DynamoDb
> > >> > >>> > > > > stream support can be enabled by reusing this source
> code.
> > >> > Flink
> > >> > >>> has
> > >> > >>> > > > > provided support for DynamoDb Streams by reusing Kinesis
> > >> > Streams
> > >> > >>> > > classes.
> > >> > >>> > > > > Regards, Vinay Patil On Sat, Sep 21, 2019 at 4:26 PM
> Taher
> > >> > >>> Koitawala
> > >> > >>> > <
> > >> > >>> > > > > [email protected]> wrote: > That would be a great
> > addition
> > >> > >>> Vinay.
> > >> > >>> > How
> > >> > >>> > > > > about adding Kinesis as well? > > Regards, > Taher
> > >> Koitawala >
> > >> > >
> > >> > >>> On
> > >> > >>> > > Sat,
> > >> > >>> > > > > Sep 21, 2019, 4:20 PM Vinay Patil <
> > [email protected]>
> > >> > >>> wrote: >
> > >> > >>> > > > >
> > >> > >>> > > > > Hi Team, > > > > The DynamoDb streams contains the CDC
> > data
> > >> > when
> > >> > >>> > > enabled
> > >> > >>> > > > on
> > >> > >>> > > > > a DynamoDb > > table, we can add a source for
> > DeltaStreamer
> > >> > which
> > >> > >>> > will
> > >> > >>> > > > > enable us to read > > this data and write it back either
> > to
> > >> > Hudi
> > >> > >>> > > dataset
> > >> > >>> > > > or
> > >> > >>> > > > > to another sink. > > > > > > Thoughts on adding this
> > >> support in
> > >> > >>> Hudi
> > >> > >>> > ?
> > >> > >>> > > >
> > >> > >>> > > > >
> > >> > >>> > > > > > > > > Regards, > > Vinay Patil > > >
> > >> > >>> > > >
> > >> > >>> > >
> > >> > >>> >
> > >> > >>>
> > >> > >>
> > >> >
> > >>
> > >
>

Reply via email to