Hi Rahul,
Thanks for creating the HIP. I have reformatted your HIP just to standardize
it.
https://docs.google.com/document/d/1bj-xpkRomVtbzvLb_4BRngDIGkkMR5yzxXRRzkA7QVo/edit?usp=sharing
Few points you need to think and elaborate in the HIP section:
1. You can mention how the schema of the csv file is going to be handled. We
should probably use the current schemaProvider and use it decode csv data. 2.
Using a config, make the source generic to support any configurable delimiters
(e:g - tab instead of comma).3. Also, you can write about how to handle
presence/absence of heading. I am assuming for kafka, this should not be a
concern but not sure if there is a standard way of storing CSV files in
data-lake. Do they include heading or not ?
Let us know if you have any thoughts/questions and we will be happy to help.
Also, If not already done, can you create a JIRA account and we can assign the
ticket.
Balaji.V
On Friday, March 15, 2019, 7:34:18 AM PDT, <[email protected]>
wrote:
On 2019/03/15 06:09:04, Umesh Kacha <[email protected]> wrote:
> Hi Rahul I am happy to volunteer for this task in case you don't have
> bandwidth for the same. Please advice.
>
> Regards,
> Umesh.
>
> On Fri, Mar 15, 2019, 6:59 AM [email protected] <[email protected]> wrote:
>
> >
> > Hi Rahul,
> > We do not have any ready made csv support in deltastreamer yet. But it
> > should be simple to extend the DeltaStreamer by implementing a CSV Source.
> > Would you be interested in writing a HIP -
> > https://cwiki.apache.org/confluence/display/HUDI/Hudi+Improvement+Plan+Details+and+Process
> > for
> > CSV support and implementing it ?
> > We will be very happy to assist you on this.
> > Thanks,Balaji.V
> >
> > On Thursday, March 14, 2019, 2:48:49 AM PDT, <[email protected]>
> > wrote:
> >
> > Dear Team
> >
> > I tested DeltaStreamer with JsonKafka,JsonDFS ..etc sources. If possible
> > please suggest how i can consume CSV data from kaka/HDFS and insert it into
> > hudi.
> >
> >
> > Thanks & Regards
> > Rahul
> >
>
Dear Balaji
I am initiating a HIP for Csv Source Support for Hudi DeltaStreamer.
Please find the HIP document in the below link.
https://docs.google.com/document/d/1bj-xpkRomVtbzvLb_4BRngDIGkkMR5yzxXRRzkA7QVo/edit?usp=sharing
I am new to this kind of open source project discussions. If there is any
mistake in my HIP requested way please correct me.
@Umesh thanks you are always welcome.
Thanks & Regards
Rahul P