Just wanted to bump this thread and see if anyone is actively working on
kinesis support

On Mon, Sep 23, 2019 at 11:51 AM Vinoth Chandar <[email protected]> wrote:

> I think we are on the same page. Thanks for clarifying!
> Note on implementation: it would be great if we can reuse the spark
> streaming connector already present
> https://spark.apache.org/docs/2.4.0/streaming-kinesis-integration.html
>
> (just like dfs, kafka and jdbc connector plans, that way we get a lot for
> freex) ..
>
> On Mon, Sep 23, 2019 at 11:13 AM Vinay Patil <[email protected]>
> wrote:
>
>> Hi Vinoth,
>>
>> I have provided the answers to your questions.
>>
>> > *should we just integrate to Kinesis? If DynamoDB will pump its
>> changes into Kinesis*
>> *anyway, why should we aware of DynanoDB directly?*
>>
>> - Yes, we should first integrate with Kinesis. As I mentioned once the
>> stream is Enabled on DynamoDb table , the CDC data can be accessed from the
>> shards in real time. So adding support for DynamoDb streams will be a
>> subtask of Kinesis.
>>
>> > If DynamoDB will pump its changes into Kinesis anyway, why should we
>> aware of DynamoDB directly?
>> - Yes, we don't need to talk to DynamoDB table directly but with the
>> streams enabled on it [1]
>>
>> > does kinesis streams have schemas mapped from DynamoDB already or
>> should we be implementing a DynamoDBSchemaProvider as well?
>>
>> -  IMO, we don't need to be aware about the schema here, we will be
>> getting only the CDC data in this stream[1] and the schema can be different
>> for each record ( adding or removing a column)
>>
>> 1.
>> https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.html
>>
>> Regards,
>> Vinay Patil
>>
>>
>>
>> On Sun, Sep 22, 2019 at 6:28 PM Vinoth Chandar <[email protected]> wrote:
>>
>>> +1 For now we can keep this in hudi-utilities itself IMO.
>>>
>>> As for the connector or Deltastreamer Source to be specific, should we
>>> just
>>> integrate to Kinesis? If DynamoDB will pump its changes into Kinesis
>>> anyway, why should we aware of DynanoDB directly?
>>> Also we may need to rethink how we are going to maintain the schema? does
>>> kinesis streams have schemas mapped from DynamoDB already or should we be
>>> implementing a DynamoDBSchemaProvider as well?
>>>
>>> This would be a really great addition. But also can see how challenging
>>> it
>>> can be (which is fun :))
>>>
>>> On Sun, Sep 22, 2019 at 4:09 AM Taher Koitawala <[email protected]>
>>> wrote:
>>>
>>> > I think this will be a good opportunity to plan better in terms of
>>> > abstraction too which is needed for the Flink and Beam engines we might
>>> > use.
>>> >
>>> > Regards,
>>> > Taher Koitawala
>>> >
>>> > On Sun, Sep 22, 2019, 3:37 PM leesf <[email protected]> wrote:
>>> >
>>> > > +1.
>>> > > Happy to see DeltaStreamer becomes more and more powerful. Also, we
>>> need
>>> > to
>>> > > pay some attention to the layout and organization of these
>>> connectors as
>>> > > more and more data sources introduced to HUDI like vinoyang
>>> suggested.
>>> > >
>>> > > Best,
>>> > > Leesf
>>> > >
>>> > > Bhavani Sudha Saktheeswaran <[email protected]>
>>> 于2019年9月22日周日
>>> > > 下午12:18写道:
>>> > >
>>> > > > +1 to adding more connectors to DeltStreamer and making them as
>>> much
>>> > > > pluggable modules as possible like Vino Yang suggested.
>>> > > >
>>> > > >
>>> > > > On Sat, Sep 21, 2019 at 7:12 PM vino yang <[email protected]>
>>> > wrote:
>>> > > >
>>> > > > > + 1 to introduce these connectors. It's nice to see that Hudi's
>>> > > ecosystem
>>> > > > > is growing. As Hudi connects to more and more systems, it is
>>> > necessary
>>> > > to
>>> > > > > introduce separate modules to place these connectors. This can
>>> lead
>>> > to
>>> > > > > module relayout or code refactoring. Of course, all this needs
>>> to be
>>> > > > > discussed in more depth. Best, Vino On 09/21/2019 18:59, Vinay
>>> Patil
>>> > > > wrote:
>>> > > > > Hi Taher, Basically this can be proposal to support Kinesis and
>>> > > DynamoDb
>>> > > > > stream support can be enabled by reusing this source code. Flink
>>> has
>>> > > > > provided support for DynamoDb Streams by reusing Kinesis Streams
>>> > > classes.
>>> > > > > Regards, Vinay Patil On Sat, Sep 21, 2019 at 4:26 PM Taher
>>> Koitawala
>>> > <
>>> > > > > [email protected]> wrote: > That would be a great addition
>>> Vinay.
>>> > How
>>> > > > > about adding Kinesis as well? > > Regards, > Taher Koitawala > >
>>> On
>>> > > Sat,
>>> > > > > Sep 21, 2019, 4:20 PM Vinay Patil <[email protected]>
>>> wrote: >
>>> > > > >
>>> > > > > Hi Team, > > > > The DynamoDb streams contains the CDC data when
>>> > > enabled
>>> > > > on
>>> > > > > a DynamoDb > > table, we can add a source for DeltaStreamer which
>>> > will
>>> > > > > enable us to read > > this data and write it back either to Hudi
>>> > > dataset
>>> > > > or
>>> > > > > to another sink. > > > > > > Thoughts on adding this support in
>>> Hudi
>>> > ?
>>> > > >
>>> > > > >
>>> > > > > > > > > Regards, > > Vinay Patil > > >
>>> > > >
>>> > >
>>> >
>>>
>>

Reply via email to