Hi Rahul,
I have added some comments in the HIP with ideas around implementation. Take a
look and proceed with working on the implementation. Let us know in this thread
if you need any help. Looking forward to your PR.
Balaji.V
On Monday, March 18, 2019, 3:44:21 AM PDT, <[email protected]>
wrote:
On 2019/03/15 16:30:55, "[email protected]" <[email protected]> wrote:
>
> Hi Rahul,
> Thanks for creating the HIP. I have reformatted your HIP just to standardize
> it.
> https://docs.google.com/document/d/1bj-xpkRomVtbzvLb_4BRngDIGkkMR5yzxXRRzkA7QVo/edit?usp=sharing
>
> Few points you need to think and elaborate in the HIP section:
> 1. You can mention how the schema of the csv file is going to be handled. We
> should probably use the current schemaProvider and use it decode csv data. 2.
> Using a config, make the source generic to support any configurable
> delimiters (e:g - tab instead of comma).3. Also, you can write about how to
> handle presence/absence of heading. I am assuming for kafka, this should not
> be a concern but not sure if there is a standard way of storing CSV files in
> data-lake. Do they include heading or not ?
>
> Let us know if you have any thoughts/questions and we will be happy to help.
> Also, If not already done, can you create a JIRA account and we can assign
> the ticket.
> Balaji.V
> On Friday, March 15, 2019, 7:34:18 AM PDT, <[email protected]>
>wrote:
>
>
>
> On 2019/03/15 06:09:04, Umesh Kacha <[email protected]> wrote:
> > Hi Rahul I am happy to volunteer for this task in case you don't have
> > bandwidth for the same. Please advice.
> >
> > Regards,
> > Umesh.
> >
> > On Fri, Mar 15, 2019, 6:59 AM [email protected] <[email protected]> wrote:
> >
> > >
> > > Hi Rahul,
> > > We do not have any ready made csv support in deltastreamer yet. But it
> > > should be simple to extend the DeltaStreamer by implementing a CSV Source.
> > > Would you be interested in writing a HIP -
> > > https://cwiki.apache.org/confluence/display/HUDI/Hudi+Improvement+Plan+Details+and+Process
> > > for
> > > CSV support and implementing it ?
> > > We will be very happy to assist you on this.
> > > Thanks,Balaji.V
> > >
> > > On Thursday, March 14, 2019, 2:48:49 AM PDT, <[email protected]>
> > > wrote:
> > >
> > > Dear Team
> > >
> > > I tested DeltaStreamer with JsonKafka,JsonDFS ..etc sources. If possible
> > > please suggest how i can consume CSV data from kaka/HDFS and insert it
> > > into
> > > hudi.
> > >
> > >
> > > Thanks & Regards
> > > Rahul
> > >
> >
> Dear Balaji
>
> I am initiating a HIP for Csv Source Support for Hudi DeltaStreamer.
> Please find the HIP document in the below link.
> https://docs.google.com/document/d/1bj-xpkRomVtbzvLb_4BRngDIGkkMR5yzxXRRzkA7QVo/edit?usp=sharing
>
> I am new to this kind of open source project discussions. If there is any
> mistake in my HIP requested way please correct me.
>
> @Umesh thanks you are always welcome.
>
> Thanks & Regards
> Rahul P
>
>
>
Dear Balaji
I have edited the HIP as per your suggestion. Please advise if any further
modification is required.
Jira Id: rahuledavalath
Thanks & Regards
Rahul