Hi Rahul,
Thanks for creating the HIP. I have reformatted your HIP just to standardize 
it. 
https://docs.google.com/document/d/1bj-xpkRomVtbzvLb_4BRngDIGkkMR5yzxXRRzkA7QVo/edit?usp=sharing

Few points you need to think and elaborate in the HIP section:
1. You can mention how the schema of the csv file is going to be handled. We 
should probably use the current schemaProvider and use it decode csv data. 2. 
Using a config, make the source generic to support any configurable delimiters 
(e:g - tab instead of comma).3. Also, you can write about how to handle 
presence/absence of heading. I am assuming for kafka, this should not be a 
concern but not sure if there is a standard way of storing CSV files in 
data-lake. Do they include heading or not ? 

Let us know if you have any thoughts/questions and we will be happy to help.
Also, If not already done, can you create a JIRA account and we can assign the 
ticket. 
Balaji.V
    On Friday, March 15, 2019, 7:34:18 AM PDT, <[email protected]> 
wrote:  
 
 

On 2019/03/15 06:09:04, Umesh Kacha <[email protected]> wrote: 
> Hi Rahul I am happy to volunteer for this task in case you don't have
> bandwidth for the same. Please advice.
> 
> Regards,
> Umesh.
> 
> On Fri, Mar 15, 2019, 6:59 AM [email protected] <[email protected]> wrote:
> 
> >
> > Hi Rahul,
> > We do not have any ready made csv  support in deltastreamer yet. But it
> > should be simple to extend the DeltaStreamer by implementing a CSV Source.
> >  Would you be interested in writing a HIP -
> > https://cwiki.apache.org/confluence/display/HUDI/Hudi+Improvement+Plan+Details+and+Process
> >  for
> > CSV support and implementing it ?
> > We will be very happy to assist you on this.
> > Thanks,Balaji.V
> >
> >    On Thursday, March 14, 2019, 2:48:49 AM PDT, <[email protected]>
> > wrote:
> >
> >  Dear Team
> >
> > I tested DeltaStreamer with JsonKafka,JsonDFS ..etc  sources. If possible
> > please suggest how i can consume CSV data from kaka/HDFS and insert it into
> > hudi.
> >
> >
> > Thanks & Regards
> > Rahul
> >
> 
Dear Balaji

I am initiating a HIP for Csv Source Support for Hudi DeltaStreamer. 
Please find the HIP document in the below link.
https://docs.google.com/document/d/1bj-xpkRomVtbzvLb_4BRngDIGkkMR5yzxXRRzkA7QVo/edit?usp=sharing

I am new to this kind of open source project discussions. If there is  any 
mistake in my HIP requested way please correct me.

@Umesh thanks you are always welcome.

Thanks & Regards
Rahul P


  

Reply via email to