Hi Vinoth, Sry to miss these, busy with on-call issues for the last couple of weeks.
Will create a ticket for tracking this , I will be actively working on this. On Wed, 16 Oct 2019, 07:01 Vinoth Chandar, <[email protected]> wrote: > Just wanted to bump this thread and see if anyone is actively working on > kinesis support > > On Mon, Sep 23, 2019 at 11:51 AM Vinoth Chandar <[email protected]> wrote: > > > I think we are on the same page. Thanks for clarifying! > > Note on implementation: it would be great if we can reuse the spark > > streaming connector already present > > https://spark.apache.org/docs/2.4.0/streaming-kinesis-integration.html > > > > (just like dfs, kafka and jdbc connector plans, that way we get a lot for > > freex) .. > > > > On Mon, Sep 23, 2019 at 11:13 AM Vinay Patil <[email protected]> > > wrote: > > > >> Hi Vinoth, > >> > >> I have provided the answers to your questions. > >> > >> > *should we just integrate to Kinesis? If DynamoDB will pump its > >> changes into Kinesis* > >> *anyway, why should we aware of DynanoDB directly?* > >> > >> - Yes, we should first integrate with Kinesis. As I mentioned once the > >> stream is Enabled on DynamoDb table , the CDC data can be accessed from > the > >> shards in real time. So adding support for DynamoDb streams will be a > >> subtask of Kinesis. > >> > >> > If DynamoDB will pump its changes into Kinesis anyway, why should we > >> aware of DynamoDB directly? > >> - Yes, we don't need to talk to DynamoDB table directly but with the > >> streams enabled on it [1] > >> > >> > does kinesis streams have schemas mapped from DynamoDB already or > >> should we be implementing a DynamoDBSchemaProvider as well? > >> > >> - IMO, we don't need to be aware about the schema here, we will be > >> getting only the CDC data in this stream[1] and the schema can be > different > >> for each record ( adding or removing a column) > >> > >> 1. > >> > https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.html > >> > >> Regards, > >> Vinay Patil > >> > >> > >> > >> On Sun, Sep 22, 2019 at 6:28 PM Vinoth Chandar <[email protected]> > wrote: > >> > >>> +1 For now we can keep this in hudi-utilities itself IMO. > >>> > >>> As for the connector or Deltastreamer Source to be specific, should we > >>> just > >>> integrate to Kinesis? If DynamoDB will pump its changes into Kinesis > >>> anyway, why should we aware of DynanoDB directly? > >>> Also we may need to rethink how we are going to maintain the schema? > does > >>> kinesis streams have schemas mapped from DynamoDB already or should we > be > >>> implementing a DynamoDBSchemaProvider as well? > >>> > >>> This would be a really great addition. But also can see how challenging > >>> it > >>> can be (which is fun :)) > >>> > >>> On Sun, Sep 22, 2019 at 4:09 AM Taher Koitawala <[email protected]> > >>> wrote: > >>> > >>> > I think this will be a good opportunity to plan better in terms of > >>> > abstraction too which is needed for the Flink and Beam engines we > might > >>> > use. > >>> > > >>> > Regards, > >>> > Taher Koitawala > >>> > > >>> > On Sun, Sep 22, 2019, 3:37 PM leesf <[email protected]> wrote: > >>> > > >>> > > +1. > >>> > > Happy to see DeltaStreamer becomes more and more powerful. Also, we > >>> need > >>> > to > >>> > > pay some attention to the layout and organization of these > >>> connectors as > >>> > > more and more data sources introduced to HUDI like vinoyang > >>> suggested. > >>> > > > >>> > > Best, > >>> > > Leesf > >>> > > > >>> > > Bhavani Sudha Saktheeswaran <[email protected]> > >>> 于2019年9月22日周日 > >>> > > 下午12:18写道: > >>> > > > >>> > > > +1 to adding more connectors to DeltStreamer and making them as > >>> much > >>> > > > pluggable modules as possible like Vino Yang suggested. > >>> > > > > >>> > > > > >>> > > > On Sat, Sep 21, 2019 at 7:12 PM vino yang <[email protected] > > > >>> > wrote: > >>> > > > > >>> > > > > + 1 to introduce these connectors. It's nice to see that Hudi's > >>> > > ecosystem > >>> > > > > is growing. As Hudi connects to more and more systems, it is > >>> > necessary > >>> > > to > >>> > > > > introduce separate modules to place these connectors. This can > >>> lead > >>> > to > >>> > > > > module relayout or code refactoring. Of course, all this needs > >>> to be > >>> > > > > discussed in more depth. Best, Vino On 09/21/2019 18:59, Vinay > >>> Patil > >>> > > > wrote: > >>> > > > > Hi Taher, Basically this can be proposal to support Kinesis and > >>> > > DynamoDb > >>> > > > > stream support can be enabled by reusing this source code. > Flink > >>> has > >>> > > > > provided support for DynamoDb Streams by reusing Kinesis > Streams > >>> > > classes. > >>> > > > > Regards, Vinay Patil On Sat, Sep 21, 2019 at 4:26 PM Taher > >>> Koitawala > >>> > < > >>> > > > > [email protected]> wrote: > That would be a great addition > >>> Vinay. > >>> > How > >>> > > > > about adding Kinesis as well? > > Regards, > Taher Koitawala > > > > >>> On > >>> > > Sat, > >>> > > > > Sep 21, 2019, 4:20 PM Vinay Patil <[email protected]> > >>> wrote: > > >>> > > > > > >>> > > > > Hi Team, > > > > The DynamoDb streams contains the CDC data > when > >>> > > enabled > >>> > > > on > >>> > > > > a DynamoDb > > table, we can add a source for DeltaStreamer > which > >>> > will > >>> > > > > enable us to read > > this data and write it back either to > Hudi > >>> > > dataset > >>> > > > or > >>> > > > > to another sink. > > > > > > Thoughts on adding this support in > >>> Hudi > >>> > ? > >>> > > > > >>> > > > > > >>> > > > > > > > > Regards, > > Vinay Patil > > > > >>> > > > > >>> > > > >>> > > >>> > >> >
