[jira] [Commented] (HUDI-184) Integrate Hudi with Apache Flink

Vinoth Chandar (Jira) Fri, 22 Nov 2019 04:39:29 -0800


    [ 
https://issues.apache.org/jira/browse/HUDI-184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16980117#comment-16980117
 ]


Vinoth Chandar commented on HUDI-184:
-------------------------------------

We can start with streaming APIs IMO. I think Batch APIs might converge into 
them..  I think if one of you can start an RFC for this formally.. we can 
evolve a strawman design there? 

Some more aspects to PoC. in Flink 
 * We need to decide when to commit a batch of record i.e pause streaming 
across workers and publish to Hudi timeline. In a purely streaming model can 
this be achieved?
 * How do we run compaction? Can a physical Flink/YARN job for e.g run both 
ingestion and compaction concurrently, as we can do with Spark/DeltaStreamer 
continuous mode now? 

> Integrate Hudi with Apache Flink
> --------------------------------
>
>                 Key: HUDI-184
>                 URL: https://issues.apache.org/jira/browse/HUDI-184
>             Project: Apache Hudi (incubating)
>          Issue Type: New Feature
>          Components: Write Client
>            Reporter: vinoyang
>            Assignee: vinoyang
>            Priority: Major
>
> Apache Flink is a popular streaming processing engine.
> Integrating Hudi with Flink is a valuable work.
> The discussion mailing thread is here: 
> [https://lists.apache.org/api/source.lua/1533de2d4cd4243fa9e8f8bf057ffd02f2ac0bec7c7539d8f72166ea@%3Cdev.hudi.apache.org%3E]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-184) Integrate Hudi with Apache Flink

Reply via email to