+1. This is a much needed and super useful feature for a lot of folks in the community.
Balaji.V On Monday, October 21, 2019, 7:08:30 AM PDT, Vinoth Chandar <[email protected]> wrote: https://issues.apache.org/jira/browse/HUDI-310 tracks this. Love to get this into the next release as much as possible :) On Thu, Oct 17, 2019 at 10:16 PM Vinoth Chandar <[email protected]> wrote: > No problem. Having kinesis will get us a compelling story for cloud data > ingestion > > On Thu, Oct 17, 2019 at 8:38 PM Vinay Patil <[email protected]> > wrote: > >> Hi Vinoth, >> >> Sry to miss these, busy with on-call issues for the last couple of weeks. >> >> Will create a ticket for tracking this , I will be actively working on >> this. >> >> On Wed, 16 Oct 2019, 07:01 Vinoth Chandar, <[email protected]> wrote: >> >> > Just wanted to bump this thread and see if anyone is actively working on >> > kinesis support >> > >> > On Mon, Sep 23, 2019 at 11:51 AM Vinoth Chandar <[email protected]> >> wrote: >> > >> > > I think we are on the same page. Thanks for clarifying! >> > > Note on implementation: it would be great if we can reuse the spark >> > > streaming connector already present >> > > >> https://spark.apache.org/docs/2.4.0/streaming-kinesis-integration.html >> > > >> > > (just like dfs, kafka and jdbc connector plans, that way we get a lot >> for >> > > freex) .. >> > > >> > > On Mon, Sep 23, 2019 at 11:13 AM Vinay Patil <[email protected] >> > >> > > wrote: >> > > >> > >> Hi Vinoth, >> > >> >> > >> I have provided the answers to your questions. >> > >> >> > >> > *should we just integrate to Kinesis? If DynamoDB will pump its >> > >> changes into Kinesis* >> > >> *anyway, why should we aware of DynanoDB directly?* >> > >> >> > >> - Yes, we should first integrate with Kinesis. As I mentioned once >> the >> > >> stream is Enabled on DynamoDb table , the CDC data can be accessed >> from >> > the >> > >> shards in real time. So adding support for DynamoDb streams will be a >> > >> subtask of Kinesis. >> > >> >> > >> > If DynamoDB will pump its changes into Kinesis anyway, why should >> we >> > >> aware of DynamoDB directly? >> > >> - Yes, we don't need to talk to DynamoDB table directly but with the >> > >> streams enabled on it [1] >> > >> >> > >> > does kinesis streams have schemas mapped from DynamoDB already or >> > >> should we be implementing a DynamoDBSchemaProvider as well? >> > >> >> > >> - IMO, we don't need to be aware about the schema here, we will be >> > >> getting only the CDC data in this stream[1] and the schema can be >> > different >> > >> for each record ( adding or removing a column) >> > >> >> > >> 1. >> > >> >> > >> https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.html >> > >> >> > >> Regards, >> > >> Vinay Patil >> > >> >> > >> >> > >> >> > >> On Sun, Sep 22, 2019 at 6:28 PM Vinoth Chandar <[email protected]> >> > wrote: >> > >> >> > >>> +1 For now we can keep this in hudi-utilities itself IMO. >> > >>> >> > >>> As for the connector or Deltastreamer Source to be specific, should >> we >> > >>> just >> > >>> integrate to Kinesis? If DynamoDB will pump its changes into Kinesis >> > >>> anyway, why should we aware of DynanoDB directly? >> > >>> Also we may need to rethink how we are going to maintain the schema? >> > does >> > >>> kinesis streams have schemas mapped from DynamoDB already or should >> we >> > be >> > >>> implementing a DynamoDBSchemaProvider as well? >> > >>> >> > >>> This would be a really great addition. But also can see how >> challenging >> > >>> it >> > >>> can be (which is fun :)) >> > >>> >> > >>> On Sun, Sep 22, 2019 at 4:09 AM Taher Koitawala <[email protected] >> > >> > >>> wrote: >> > >>> >> > >>> > I think this will be a good opportunity to plan better in terms of >> > >>> > abstraction too which is needed for the Flink and Beam engines we >> > might >> > >>> > use. >> > >>> > >> > >>> > Regards, >> > >>> > Taher Koitawala >> > >>> > >> > >>> > On Sun, Sep 22, 2019, 3:37 PM leesf <[email protected]> wrote: >> > >>> > >> > >>> > > +1. >> > >>> > > Happy to see DeltaStreamer becomes more and more powerful. >> Also, we >> > >>> need >> > >>> > to >> > >>> > > pay some attention to the layout and organization of these >> > >>> connectors as >> > >>> > > more and more data sources introduced to HUDI like vinoyang >> > >>> suggested. >> > >>> > > >> > >>> > > Best, >> > >>> > > Leesf >> > >>> > > >> > >>> > > Bhavani Sudha Saktheeswaran <[email protected]> >> > >>> 于2019年9月22日周日 >> > >>> > > 下午12:18写道: >> > >>> > > >> > >>> > > > +1 to adding more connectors to DeltStreamer and making them >> as >> > >>> much >> > >>> > > > pluggable modules as possible like Vino Yang suggested. >> > >>> > > > >> > >>> > > > >> > >>> > > > On Sat, Sep 21, 2019 at 7:12 PM vino yang < >> [email protected] >> > > >> > >>> > wrote: >> > >>> > > > >> > >>> > > > > + 1 to introduce these connectors. It's nice to see that >> Hudi's >> > >>> > > ecosystem >> > >>> > > > > is growing. As Hudi connects to more and more systems, it is >> > >>> > necessary >> > >>> > > to >> > >>> > > > > introduce separate modules to place these connectors. This >> can >> > >>> lead >> > >>> > to >> > >>> > > > > module relayout or code refactoring. Of course, all this >> needs >> > >>> to be >> > >>> > > > > discussed in more depth. Best, Vino On 09/21/2019 18:59, >> Vinay >> > >>> Patil >> > >>> > > > wrote: >> > >>> > > > > Hi Taher, Basically this can be proposal to support Kinesis >> and >> > >>> > > DynamoDb >> > >>> > > > > stream support can be enabled by reusing this source code. >> > Flink >> > >>> has >> > >>> > > > > provided support for DynamoDb Streams by reusing Kinesis >> > Streams >> > >>> > > classes. >> > >>> > > > > Regards, Vinay Patil On Sat, Sep 21, 2019 at 4:26 PM Taher >> > >>> Koitawala >> > >>> > < >> > >>> > > > > [email protected]> wrote: > That would be a great addition >> > >>> Vinay. >> > >>> > How >> > >>> > > > > about adding Kinesis as well? > > Regards, > Taher >> Koitawala > >> > > >> > >>> On >> > >>> > > Sat, >> > >>> > > > > Sep 21, 2019, 4:20 PM Vinay Patil <[email protected]> >> > >>> wrote: > >> > >>> > > > > >> > >>> > > > > Hi Team, > > > > The DynamoDb streams contains the CDC data >> > when >> > >>> > > enabled >> > >>> > > > on >> > >>> > > > > a DynamoDb > > table, we can add a source for DeltaStreamer >> > which >> > >>> > will >> > >>> > > > > enable us to read > > this data and write it back either to >> > Hudi >> > >>> > > dataset >> > >>> > > > or >> > >>> > > > > to another sink. > > > > > > Thoughts on adding this >> support in >> > >>> Hudi >> > >>> > ? >> > >>> > > > >> > >>> > > > > >> > >>> > > > > > > > > Regards, > > Vinay Patil > > > >> > >>> > > > >> > >>> > > >> > >>> > >> > >>> >> > >> >> > >> >
