+1. This is a much needed and super useful feature for a lot of folks in the 
community.

Balaji.V     On Monday, October 21, 2019, 7:08:30 AM PDT, Vinoth Chandar 
<[email protected]> wrote:  
 
 https://issues.apache.org/jira/browse/HUDI-310 tracks this. Love to get
this into the next release as much as possible :)

On Thu, Oct 17, 2019 at 10:16 PM Vinoth Chandar <[email protected]> wrote:

> No problem. Having kinesis will get us a compelling story for cloud data
> ingestion
>
> On Thu, Oct 17, 2019 at 8:38 PM Vinay Patil <[email protected]>
> wrote:
>
>> Hi Vinoth,
>>
>> Sry to miss these, busy with on-call issues for the last couple of weeks.
>>
>> Will create a ticket for tracking this , I will be actively working on
>> this.
>>
>> On Wed, 16 Oct 2019, 07:01 Vinoth Chandar, <[email protected]> wrote:
>>
>> > Just wanted to bump this thread and see if anyone is actively working on
>> > kinesis support
>> >
>> > On Mon, Sep 23, 2019 at 11:51 AM Vinoth Chandar <[email protected]>
>> wrote:
>> >
>> > > I think we are on the same page. Thanks for clarifying!
>> > > Note on implementation: it would be great if we can reuse the spark
>> > > streaming connector already present
>> > >
>> https://spark.apache.org/docs/2.4.0/streaming-kinesis-integration.html
>> > >
>> > > (just like dfs, kafka and jdbc connector plans, that way we get a lot
>> for
>> > > freex) ..
>> > >
>> > > On Mon, Sep 23, 2019 at 11:13 AM Vinay Patil <[email protected]
>> >
>> > > wrote:
>> > >
>> > >> Hi Vinoth,
>> > >>
>> > >> I have provided the answers to your questions.
>> > >>
>> > >> > *should we just integrate to Kinesis? If DynamoDB will pump its
>> > >> changes into Kinesis*
>> > >> *anyway, why should we aware of DynanoDB directly?*
>> > >>
>> > >> - Yes, we should first integrate with Kinesis. As I mentioned once
>> the
>> > >> stream is Enabled on DynamoDb table , the CDC data can be accessed
>> from
>> > the
>> > >> shards in real time. So adding support for DynamoDb streams will be a
>> > >> subtask of Kinesis.
>> > >>
>> > >> > If DynamoDB will pump its changes into Kinesis anyway, why should
>> we
>> > >> aware of DynamoDB directly?
>> > >> - Yes, we don't need to talk to DynamoDB table directly but with the
>> > >> streams enabled on it [1]
>> > >>
>> > >> > does kinesis streams have schemas mapped from DynamoDB already or
>> > >> should we be implementing a DynamoDBSchemaProvider as well?
>> > >>
>> > >> -  IMO, we don't need to be aware about the schema here, we will be
>> > >> getting only the CDC data in this stream[1] and the schema can be
>> > different
>> > >> for each record ( adding or removing a column)
>> > >>
>> > >> 1.
>> > >>
>> >
>> https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.html
>> > >>
>> > >> Regards,
>> > >> Vinay Patil
>> > >>
>> > >>
>> > >>
>> > >> On Sun, Sep 22, 2019 at 6:28 PM Vinoth Chandar <[email protected]>
>> > wrote:
>> > >>
>> > >>> +1 For now we can keep this in hudi-utilities itself IMO.
>> > >>>
>> > >>> As for the connector or Deltastreamer Source to be specific, should
>> we
>> > >>> just
>> > >>> integrate to Kinesis? If DynamoDB will pump its changes into Kinesis
>> > >>> anyway, why should we aware of DynanoDB directly?
>> > >>> Also we may need to rethink how we are going to maintain the schema?
>> > does
>> > >>> kinesis streams have schemas mapped from DynamoDB already or should
>> we
>> > be
>> > >>> implementing a DynamoDBSchemaProvider as well?
>> > >>>
>> > >>> This would be a really great addition. But also can see how
>> challenging
>> > >>> it
>> > >>> can be (which is fun :))
>> > >>>
>> > >>> On Sun, Sep 22, 2019 at 4:09 AM Taher Koitawala <[email protected]
>> >
>> > >>> wrote:
>> > >>>
>> > >>> > I think this will be a good opportunity to plan better in terms of
>> > >>> > abstraction too which is needed for the Flink and Beam engines we
>> > might
>> > >>> > use.
>> > >>> >
>> > >>> > Regards,
>> > >>> > Taher Koitawala
>> > >>> >
>> > >>> > On Sun, Sep 22, 2019, 3:37 PM leesf <[email protected]> wrote:
>> > >>> >
>> > >>> > > +1.
>> > >>> > > Happy to see DeltaStreamer becomes more and more powerful.
>> Also, we
>> > >>> need
>> > >>> > to
>> > >>> > > pay some attention to the layout and organization of these
>> > >>> connectors as
>> > >>> > > more and more data sources introduced to HUDI like vinoyang
>> > >>> suggested.
>> > >>> > >
>> > >>> > > Best,
>> > >>> > > Leesf
>> > >>> > >
>> > >>> > > Bhavani Sudha Saktheeswaran <[email protected]>
>> > >>> 于2019年9月22日周日
>> > >>> > > 下午12:18写道:
>> > >>> > >
>> > >>> > > > +1 to adding more connectors to DeltStreamer and making them
>> as
>> > >>> much
>> > >>> > > > pluggable modules as possible like Vino Yang suggested.
>> > >>> > > >
>> > >>> > > >
>> > >>> > > > On Sat, Sep 21, 2019 at 7:12 PM vino yang <
>> [email protected]
>> > >
>> > >>> > wrote:
>> > >>> > > >
>> > >>> > > > > + 1 to introduce these connectors. It's nice to see that
>> Hudi's
>> > >>> > > ecosystem
>> > >>> > > > > is growing. As Hudi connects to more and more systems, it is
>> > >>> > necessary
>> > >>> > > to
>> > >>> > > > > introduce separate modules to place these connectors. This
>> can
>> > >>> lead
>> > >>> > to
>> > >>> > > > > module relayout or code refactoring. Of course, all this
>> needs
>> > >>> to be
>> > >>> > > > > discussed in more depth. Best, Vino On 09/21/2019 18:59,
>> Vinay
>> > >>> Patil
>> > >>> > > > wrote:
>> > >>> > > > > Hi Taher, Basically this can be proposal to support Kinesis
>> and
>> > >>> > > DynamoDb
>> > >>> > > > > stream support can be enabled by reusing this source code.
>> > Flink
>> > >>> has
>> > >>> > > > > provided support for DynamoDb Streams by reusing Kinesis
>> > Streams
>> > >>> > > classes.
>> > >>> > > > > Regards, Vinay Patil On Sat, Sep 21, 2019 at 4:26 PM Taher
>> > >>> Koitawala
>> > >>> > <
>> > >>> > > > > [email protected]> wrote: > That would be a great addition
>> > >>> Vinay.
>> > >>> > How
>> > >>> > > > > about adding Kinesis as well? > > Regards, > Taher
>> Koitawala >
>> > >
>> > >>> On
>> > >>> > > Sat,
>> > >>> > > > > Sep 21, 2019, 4:20 PM Vinay Patil <[email protected]>
>> > >>> wrote: >
>> > >>> > > > >
>> > >>> > > > > Hi Team, > > > > The DynamoDb streams contains the CDC data
>> > when
>> > >>> > > enabled
>> > >>> > > > on
>> > >>> > > > > a DynamoDb > > table, we can add a source for DeltaStreamer
>> > which
>> > >>> > will
>> > >>> > > > > enable us to read > > this data and write it back either to
>> > Hudi
>> > >>> > > dataset
>> > >>> > > > or
>> > >>> > > > > to another sink. > > > > > > Thoughts on adding this
>> support in
>> > >>> Hudi
>> > >>> > ?
>> > >>> > > >
>> > >>> > > > >
>> > >>> > > > > > > > > Regards, > > Vinay Patil > > >
>> > >>> > > >
>> > >>> > >
>> > >>> >
>> > >>>
>> > >>
>> >
>>
>  

Reply via email to