I think we are on the same page. Thanks for clarifying! Note on implementation: it would be great if we can reuse the spark streaming connector already present https://spark.apache.org/docs/2.4.0/streaming-kinesis-integration.html
(just like dfs, kafka and jdbc connector plans, that way we get a lot for freex) .. On Mon, Sep 23, 2019 at 11:13 AM Vinay Patil <[email protected]> wrote: > Hi Vinoth, > > I have provided the answers to your questions. > > > *should we just integrate to Kinesis? If DynamoDB will pump its changes > into Kinesis* > *anyway, why should we aware of DynanoDB directly?* > > - Yes, we should first integrate with Kinesis. As I mentioned once the > stream is Enabled on DynamoDb table , the CDC data can be accessed from the > shards in real time. So adding support for DynamoDb streams will be a > subtask of Kinesis. > > > If DynamoDB will pump its changes into Kinesis anyway, why should we > aware of DynamoDB directly? > - Yes, we don't need to talk to DynamoDB table directly but with the > streams enabled on it [1] > > > does kinesis streams have schemas mapped from DynamoDB already or should > we be implementing a DynamoDBSchemaProvider as well? > > - IMO, we don't need to be aware about the schema here, we will be > getting only the CDC data in this stream[1] and the schema can be different > for each record ( adding or removing a column) > > 1. > https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.html > > Regards, > Vinay Patil > > > > On Sun, Sep 22, 2019 at 6:28 PM Vinoth Chandar <[email protected]> wrote: > >> +1 For now we can keep this in hudi-utilities itself IMO. >> >> As for the connector or Deltastreamer Source to be specific, should we >> just >> integrate to Kinesis? If DynamoDB will pump its changes into Kinesis >> anyway, why should we aware of DynanoDB directly? >> Also we may need to rethink how we are going to maintain the schema? does >> kinesis streams have schemas mapped from DynamoDB already or should we be >> implementing a DynamoDBSchemaProvider as well? >> >> This would be a really great addition. But also can see how challenging it >> can be (which is fun :)) >> >> On Sun, Sep 22, 2019 at 4:09 AM Taher Koitawala <[email protected]> >> wrote: >> >> > I think this will be a good opportunity to plan better in terms of >> > abstraction too which is needed for the Flink and Beam engines we might >> > use. >> > >> > Regards, >> > Taher Koitawala >> > >> > On Sun, Sep 22, 2019, 3:37 PM leesf <[email protected]> wrote: >> > >> > > +1. >> > > Happy to see DeltaStreamer becomes more and more powerful. Also, we >> need >> > to >> > > pay some attention to the layout and organization of these connectors >> as >> > > more and more data sources introduced to HUDI like vinoyang suggested. >> > > >> > > Best, >> > > Leesf >> > > >> > > Bhavani Sudha Saktheeswaran <[email protected]> 于2019年9月22日周日 >> > > 下午12:18写道: >> > > >> > > > +1 to adding more connectors to DeltStreamer and making them as much >> > > > pluggable modules as possible like Vino Yang suggested. >> > > > >> > > > >> > > > On Sat, Sep 21, 2019 at 7:12 PM vino yang <[email protected]> >> > wrote: >> > > > >> > > > > + 1 to introduce these connectors. It's nice to see that Hudi's >> > > ecosystem >> > > > > is growing. As Hudi connects to more and more systems, it is >> > necessary >> > > to >> > > > > introduce separate modules to place these connectors. This can >> lead >> > to >> > > > > module relayout or code refactoring. Of course, all this needs to >> be >> > > > > discussed in more depth. Best, Vino On 09/21/2019 18:59, Vinay >> Patil >> > > > wrote: >> > > > > Hi Taher, Basically this can be proposal to support Kinesis and >> > > DynamoDb >> > > > > stream support can be enabled by reusing this source code. Flink >> has >> > > > > provided support for DynamoDb Streams by reusing Kinesis Streams >> > > classes. >> > > > > Regards, Vinay Patil On Sat, Sep 21, 2019 at 4:26 PM Taher >> Koitawala >> > < >> > > > > [email protected]> wrote: > That would be a great addition >> Vinay. >> > How >> > > > > about adding Kinesis as well? > > Regards, > Taher Koitawala > > >> On >> > > Sat, >> > > > > Sep 21, 2019, 4:20 PM Vinay Patil <[email protected]> >> wrote: > >> > > > > >> > > > > Hi Team, > > > > The DynamoDb streams contains the CDC data when >> > > enabled >> > > > on >> > > > > a DynamoDb > > table, we can add a source for DeltaStreamer which >> > will >> > > > > enable us to read > > this data and write it back either to Hudi >> > > dataset >> > > > or >> > > > > to another sink. > > > > > > Thoughts on adding this support in >> Hudi >> > ? >> > > > >> > > > > >> > > > > > > > > Regards, > > Vinay Patil > > > >> > > > >> > > >> > >> >
