Sg, lets capture these discussions in the JIRA (link to the discussion thread should suffice) and we can revisit one by one..
On Mon, Sep 23, 2019 at 8:31 PM Taher Koitawala <taher...@gmail.com> wrote: > Sure Vinoth, I think we need to try this out and check how it fits together > and how deployable it is. > > On Sun, Sep 22, 2019, 7:01 PM Vinoth Chandar <vin...@apache.org> wrote: > > > See a lot of Spark Streaming receiver based approach code there, which > > makes me a bit worried about scalability. > > > > Nonetheless. API wise cant we just so dstream.rdd.forEach? And issue > these > > writes using the WriteClient api? > > > > On Sat, Sep 21, 2019 at 4:16 AM Taher Koitawala <taher...@gmail.com> > > wrote: > > > > > Hi Vinoth, > > > Nifi has the capability to pass data to a custom spark > > job. > > > However that is done through a StreamingContext, not sure if we can > build > > > something on this. I'm trying to wrap my head around how to fit the > > > StreamingContext in our existing code. > > > > > > Here is an example: > > > https://blogs.apache.org/nifi/entry/stream_processing_nifi_and_spark > > > > > > Regards, > > > Taher Koitawala > > > > > > On Wed, Sep 18, 2019, 8:27 PM Vinoth Chandar <vin...@apache.org> > wrote: > > > > > > > Not too familiar wth Nifi myself. Would this still target an use-case > > > like > > > > what pratyaksh mentioned? > > > > For delta streamer specifically, we are moving more and more towards > > > > continuous mode, where > > > > Hudi writing and compaction are amanged by a single long running > spark > > > > application. > > > > > > > > Would Nifi also help us manage compactions when working with Hudi > > > > datasource or just writing plain spark Hudi pipelines? > > > > > > > > On 2019/09/18 08:18:44, Taher Koitawala <taher...@gmail.com> wrote: > > > > > That's another way of doing things. I want to know if someone wrote > > > > > something like PutParquet. Which directly can write data to Hudi. > > > AFAIK I > > > > > don't think anyone has. > > > > > > > > > > That will really be powerful. > > > > > > > > > > On Wed, Sep 18, 2019, 1:37 PM Pratyaksh Sharma < > > pratyaks...@gmail.com> > > > > > wrote: > > > > > > > > > > > Hi Taher, > > > > > > > > > > > > In the initial phase of our CDC pipeline, we were using Hudi with > > > Nifi. > > > > > > Nifi was being used to read Binlog file of mysql and to push that > > > data > > > > to > > > > > > some Kafka topic. This topic was then getting consumed by > > > > DeltaStreamer. So > > > > > > Nifi was indirectly involved in that flow. > > > > > > > > > > > > On Wed, Sep 18, 2019 at 10:29 AM Taher Koitawala < > > taher...@gmail.com > > > > > > > > > > wrote: > > > > > > > > > > > > > Hi All, > > > > > > > Just wanted to know has anyone tried to write data to > > > Hudi > > > > > > with a > > > > > > > Nifi flow? > > > > > > > > > > > > > > Perhaps may be just a csv file on local to Hudi dataset? If not > > > then > > > > lets > > > > > > > try that! > > > > > > > > > > > > > > Regards, > > > > > > > Taher Koitawala > > > > > > > > > > > > > > > > > > > > > > > > > > > >