Just wanted to bump this thread and see if anyone is actively working on kinesis support
On Mon, Sep 23, 2019 at 11:51 AM Vinoth Chandar <[email protected]> wrote: > I think we are on the same page. Thanks for clarifying! > Note on implementation: it would be great if we can reuse the spark > streaming connector already present > https://spark.apache.org/docs/2.4.0/streaming-kinesis-integration.html > > (just like dfs, kafka and jdbc connector plans, that way we get a lot for > freex) .. > > On Mon, Sep 23, 2019 at 11:13 AM Vinay Patil <[email protected]> > wrote: > >> Hi Vinoth, >> >> I have provided the answers to your questions. >> >> > *should we just integrate to Kinesis? If DynamoDB will pump its >> changes into Kinesis* >> *anyway, why should we aware of DynanoDB directly?* >> >> - Yes, we should first integrate with Kinesis. As I mentioned once the >> stream is Enabled on DynamoDb table , the CDC data can be accessed from the >> shards in real time. So adding support for DynamoDb streams will be a >> subtask of Kinesis. >> >> > If DynamoDB will pump its changes into Kinesis anyway, why should we >> aware of DynamoDB directly? >> - Yes, we don't need to talk to DynamoDB table directly but with the >> streams enabled on it [1] >> >> > does kinesis streams have schemas mapped from DynamoDB already or >> should we be implementing a DynamoDBSchemaProvider as well? >> >> - IMO, we don't need to be aware about the schema here, we will be >> getting only the CDC data in this stream[1] and the schema can be different >> for each record ( adding or removing a column) >> >> 1. >> https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.html >> >> Regards, >> Vinay Patil >> >> >> >> On Sun, Sep 22, 2019 at 6:28 PM Vinoth Chandar <[email protected]> wrote: >> >>> +1 For now we can keep this in hudi-utilities itself IMO. >>> >>> As for the connector or Deltastreamer Source to be specific, should we >>> just >>> integrate to Kinesis? If DynamoDB will pump its changes into Kinesis >>> anyway, why should we aware of DynanoDB directly? >>> Also we may need to rethink how we are going to maintain the schema? does >>> kinesis streams have schemas mapped from DynamoDB already or should we be >>> implementing a DynamoDBSchemaProvider as well? >>> >>> This would be a really great addition. But also can see how challenging >>> it >>> can be (which is fun :)) >>> >>> On Sun, Sep 22, 2019 at 4:09 AM Taher Koitawala <[email protected]> >>> wrote: >>> >>> > I think this will be a good opportunity to plan better in terms of >>> > abstraction too which is needed for the Flink and Beam engines we might >>> > use. >>> > >>> > Regards, >>> > Taher Koitawala >>> > >>> > On Sun, Sep 22, 2019, 3:37 PM leesf <[email protected]> wrote: >>> > >>> > > +1. >>> > > Happy to see DeltaStreamer becomes more and more powerful. Also, we >>> need >>> > to >>> > > pay some attention to the layout and organization of these >>> connectors as >>> > > more and more data sources introduced to HUDI like vinoyang >>> suggested. >>> > > >>> > > Best, >>> > > Leesf >>> > > >>> > > Bhavani Sudha Saktheeswaran <[email protected]> >>> 于2019年9月22日周日 >>> > > 下午12:18写道: >>> > > >>> > > > +1 to adding more connectors to DeltStreamer and making them as >>> much >>> > > > pluggable modules as possible like Vino Yang suggested. >>> > > > >>> > > > >>> > > > On Sat, Sep 21, 2019 at 7:12 PM vino yang <[email protected]> >>> > wrote: >>> > > > >>> > > > > + 1 to introduce these connectors. It's nice to see that Hudi's >>> > > ecosystem >>> > > > > is growing. As Hudi connects to more and more systems, it is >>> > necessary >>> > > to >>> > > > > introduce separate modules to place these connectors. This can >>> lead >>> > to >>> > > > > module relayout or code refactoring. Of course, all this needs >>> to be >>> > > > > discussed in more depth. Best, Vino On 09/21/2019 18:59, Vinay >>> Patil >>> > > > wrote: >>> > > > > Hi Taher, Basically this can be proposal to support Kinesis and >>> > > DynamoDb >>> > > > > stream support can be enabled by reusing this source code. Flink >>> has >>> > > > > provided support for DynamoDb Streams by reusing Kinesis Streams >>> > > classes. >>> > > > > Regards, Vinay Patil On Sat, Sep 21, 2019 at 4:26 PM Taher >>> Koitawala >>> > < >>> > > > > [email protected]> wrote: > That would be a great addition >>> Vinay. >>> > How >>> > > > > about adding Kinesis as well? > > Regards, > Taher Koitawala > > >>> On >>> > > Sat, >>> > > > > Sep 21, 2019, 4:20 PM Vinay Patil <[email protected]> >>> wrote: > >>> > > > > >>> > > > > Hi Team, > > > > The DynamoDb streams contains the CDC data when >>> > > enabled >>> > > > on >>> > > > > a DynamoDb > > table, we can add a source for DeltaStreamer which >>> > will >>> > > > > enable us to read > > this data and write it back either to Hudi >>> > > dataset >>> > > > or >>> > > > > to another sink. > > > > > > Thoughts on adding this support in >>> Hudi >>> > ? >>> > > > >>> > > > > >>> > > > > > > > > Regards, > > Vinay Patil > > > >>> > > > >>> > > >>> > >>> >>
