[jira] [Commented] (HUDI-76) CSV Source support for Hudi Delta Streamer

Ethan Guo (Jira) Wed, 08 Jan 2020 22:02:52 -0800


    [ 
https://issues.apache.org/jira/browse/HUDI-76?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17011447#comment-17011447
 ]


Ethan Guo commented on HUDI-76:
-------------------------------

[~liujinhui]  Yes, that's a good question.  I'm also addressing this issue on 
connecting the Hudi schemaProvider to the Spark csv/text reader.

As we discussed offline, one possible way to enforce schema provided by Hudi 
for text format (no header line) is to transform the Hudi schema into the 
schema that Spark DataFrameReader recognizes, and then pass it to the 
DataFrameReader for csv/text file.  User can choose to enable/disable this 
behavior through a new property.

I'll make corresponding code changes in the PR.  Meanwhile let me know if you 
find any other gaps.  Also feel free to review the PR once it's ready :)

> CSV Source support for Hudi Delta Streamer
> ------------------------------------------
>
>                 Key: HUDI-76
>                 URL: https://issues.apache.org/jira/browse/HUDI-76
>             Project: Apache Hudi (incubating)
>          Issue Type: Improvement
>          Components: DeltaStreamer, Incremental Pull
>            Reporter: Balaji Varadarajan
>            Assignee: Ethan Guo
>            Priority: Minor
>              Labels: pull-request-available
>             Fix For: 0.5.1
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> DeltaStreamer does not have support to pull CSV data from sources (hdfs log 
> files/kafka). THis ticket is to provide support for csv sources.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-76) CSV Source support for Hudi Delta Streamer

Reply via email to