Great! On Wed, Oct 23, 2019 at 10:41 PM Vinay Patil <[email protected]> wrote:
> Thanks a lot Vinoth for opening this jira. > > Will start with the initial design and share the document. > > Regards, > Vinay Patil > > > On Mon, Oct 21, 2019 at 9:36 PM Balaji Varadarajan > <[email protected]> wrote: > > > +1. This is a much needed and super useful feature for a lot of folks in > > the community. > > > > Balaji.V On Monday, October 21, 2019, 7:08:30 AM PDT, Vinoth Chandar > < > > [email protected]> wrote: > > > > https://issues.apache.org/jira/browse/HUDI-310 tracks this. Love to get > > this into the next release as much as possible :) > > > > On Thu, Oct 17, 2019 at 10:16 PM Vinoth Chandar <[email protected]> > wrote: > > > > > No problem. Having kinesis will get us a compelling story for cloud > data > > > ingestion > > > > > > On Thu, Oct 17, 2019 at 8:38 PM Vinay Patil <[email protected]> > > > wrote: > > > > > >> Hi Vinoth, > > >> > > >> Sry to miss these, busy with on-call issues for the last couple of > > weeks. > > >> > > >> Will create a ticket for tracking this , I will be actively working on > > >> this. > > >> > > >> On Wed, 16 Oct 2019, 07:01 Vinoth Chandar, <[email protected]> wrote: > > >> > > >> > Just wanted to bump this thread and see if anyone is actively > working > > on > > >> > kinesis support > > >> > > > >> > On Mon, Sep 23, 2019 at 11:51 AM Vinoth Chandar <[email protected]> > > >> wrote: > > >> > > > >> > > I think we are on the same page. Thanks for clarifying! > > >> > > Note on implementation: it would be great if we can reuse the > spark > > >> > > streaming connector already present > > >> > > > > >> > https://spark.apache.org/docs/2.4.0/streaming-kinesis-integration.html > > >> > > > > >> > > (just like dfs, kafka and jdbc connector plans, that way we get a > > lot > > >> for > > >> > > freex) .. > > >> > > > > >> > > On Mon, Sep 23, 2019 at 11:13 AM Vinay Patil < > > [email protected] > > >> > > > >> > > wrote: > > >> > > > > >> > >> Hi Vinoth, > > >> > >> > > >> > >> I have provided the answers to your questions. > > >> > >> > > >> > >> > *should we just integrate to Kinesis? If DynamoDB will pump its > > >> > >> changes into Kinesis* > > >> > >> *anyway, why should we aware of DynanoDB directly?* > > >> > >> > > >> > >> - Yes, we should first integrate with Kinesis. As I mentioned > once > > >> the > > >> > >> stream is Enabled on DynamoDb table , the CDC data can be > accessed > > >> from > > >> > the > > >> > >> shards in real time. So adding support for DynamoDb streams will > > be a > > >> > >> subtask of Kinesis. > > >> > >> > > >> > >> > If DynamoDB will pump its changes into Kinesis anyway, why > should > > >> we > > >> > >> aware of DynamoDB directly? > > >> > >> - Yes, we don't need to talk to DynamoDB table directly but with > > the > > >> > >> streams enabled on it [1] > > >> > >> > > >> > >> > does kinesis streams have schemas mapped from DynamoDB already > or > > >> > >> should we be implementing a DynamoDBSchemaProvider as well? > > >> > >> > > >> > >> - IMO, we don't need to be aware about the schema here, we will > be > > >> > >> getting only the CDC data in this stream[1] and the schema can be > > >> > different > > >> > >> for each record ( adding or removing a column) > > >> > >> > > >> > >> 1. > > >> > >> > > >> > > > >> > > > https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.html > > >> > >> > > >> > >> Regards, > > >> > >> Vinay Patil > > >> > >> > > >> > >> > > >> > >> > > >> > >> On Sun, Sep 22, 2019 at 6:28 PM Vinoth Chandar < > [email protected]> > > >> > wrote: > > >> > >> > > >> > >>> +1 For now we can keep this in hudi-utilities itself IMO. > > >> > >>> > > >> > >>> As for the connector or Deltastreamer Source to be specific, > > should > > >> we > > >> > >>> just > > >> > >>> integrate to Kinesis? If DynamoDB will pump its changes into > > Kinesis > > >> > >>> anyway, why should we aware of DynanoDB directly? > > >> > >>> Also we may need to rethink how we are going to maintain the > > schema? > > >> > does > > >> > >>> kinesis streams have schemas mapped from DynamoDB already or > > should > > >> we > > >> > be > > >> > >>> implementing a DynamoDBSchemaProvider as well? > > >> > >>> > > >> > >>> This would be a really great addition. But also can see how > > >> challenging > > >> > >>> it > > >> > >>> can be (which is fun :)) > > >> > >>> > > >> > >>> On Sun, Sep 22, 2019 at 4:09 AM Taher Koitawala < > > [email protected] > > >> > > > >> > >>> wrote: > > >> > >>> > > >> > >>> > I think this will be a good opportunity to plan better in > terms > > of > > >> > >>> > abstraction too which is needed for the Flink and Beam engines > > we > > >> > might > > >> > >>> > use. > > >> > >>> > > > >> > >>> > Regards, > > >> > >>> > Taher Koitawala > > >> > >>> > > > >> > >>> > On Sun, Sep 22, 2019, 3:37 PM leesf <[email protected]> > > wrote: > > >> > >>> > > > >> > >>> > > +1. > > >> > >>> > > Happy to see DeltaStreamer becomes more and more powerful. > > >> Also, we > > >> > >>> need > > >> > >>> > to > > >> > >>> > > pay some attention to the layout and organization of these > > >> > >>> connectors as > > >> > >>> > > more and more data sources introduced to HUDI like vinoyang > > >> > >>> suggested. > > >> > >>> > > > > >> > >>> > > Best, > > >> > >>> > > Leesf > > >> > >>> > > > > >> > >>> > > Bhavani Sudha Saktheeswaran <[email protected]> > > >> > >>> 于2019年9月22日周日 > > >> > >>> > > 下午12:18写道: > > >> > >>> > > > > >> > >>> > > > +1 to adding more connectors to DeltStreamer and making > them > > >> as > > >> > >>> much > > >> > >>> > > > pluggable modules as possible like Vino Yang suggested. > > >> > >>> > > > > > >> > >>> > > > > > >> > >>> > > > On Sat, Sep 21, 2019 at 7:12 PM vino yang < > > >> [email protected] > > >> > > > > >> > >>> > wrote: > > >> > >>> > > > > > >> > >>> > > > > + 1 to introduce these connectors. It's nice to see that > > >> Hudi's > > >> > >>> > > ecosystem > > >> > >>> > > > > is growing. As Hudi connects to more and more systems, > it > > is > > >> > >>> > necessary > > >> > >>> > > to > > >> > >>> > > > > introduce separate modules to place these connectors. > This > > >> can > > >> > >>> lead > > >> > >>> > to > > >> > >>> > > > > module relayout or code refactoring. Of course, all this > > >> needs > > >> > >>> to be > > >> > >>> > > > > discussed in more depth. Best, Vino On 09/21/2019 18:59, > > >> Vinay > > >> > >>> Patil > > >> > >>> > > > wrote: > > >> > >>> > > > > Hi Taher, Basically this can be proposal to support > > Kinesis > > >> and > > >> > >>> > > DynamoDb > > >> > >>> > > > > stream support can be enabled by reusing this source > code. > > >> > Flink > > >> > >>> has > > >> > >>> > > > > provided support for DynamoDb Streams by reusing Kinesis > > >> > Streams > > >> > >>> > > classes. > > >> > >>> > > > > Regards, Vinay Patil On Sat, Sep 21, 2019 at 4:26 PM > Taher > > >> > >>> Koitawala > > >> > >>> > < > > >> > >>> > > > > [email protected]> wrote: > That would be a great > > addition > > >> > >>> Vinay. > > >> > >>> > How > > >> > >>> > > > > about adding Kinesis as well? > > Regards, > Taher > > >> Koitawala > > > >> > > > > >> > >>> On > > >> > >>> > > Sat, > > >> > >>> > > > > Sep 21, 2019, 4:20 PM Vinay Patil < > > [email protected]> > > >> > >>> wrote: > > > >> > >>> > > > > > > >> > >>> > > > > Hi Team, > > > > The DynamoDb streams contains the CDC > > data > > >> > when > > >> > >>> > > enabled > > >> > >>> > > > on > > >> > >>> > > > > a DynamoDb > > table, we can add a source for > > DeltaStreamer > > >> > which > > >> > >>> > will > > >> > >>> > > > > enable us to read > > this data and write it back either > > to > > >> > Hudi > > >> > >>> > > dataset > > >> > >>> > > > or > > >> > >>> > > > > to another sink. > > > > > > Thoughts on adding this > > >> support in > > >> > >>> Hudi > > >> > >>> > ? > > >> > >>> > > > > > >> > >>> > > > > > > >> > >>> > > > > > > > > Regards, > > Vinay Patil > > > > > >> > >>> > > > > > >> > >>> > > > > >> > >>> > > > >> > >>> > > >> > >> > > >> > > > >> > > > >
