[
https://issues.apache.org/jira/browse/HUDI-184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16980117#comment-16980117
]
Vinoth Chandar commented on HUDI-184:
-------------------------------------
We can start with streaming APIs IMO. I think Batch APIs might converge into
them.. I think if one of you can start an RFC for this formally.. we can
evolve a strawman design there?
Some more aspects to PoC. in Flink
* We need to decide when to commit a batch of record i.e pause streaming
across workers and publish to Hudi timeline. In a purely streaming model can
this be achieved?
* How do we run compaction? Can a physical Flink/YARN job for e.g run both
ingestion and compaction concurrently, as we can do with Spark/DeltaStreamer
continuous mode now?
> Integrate Hudi with Apache Flink
> --------------------------------
>
> Key: HUDI-184
> URL: https://issues.apache.org/jira/browse/HUDI-184
> Project: Apache Hudi (incubating)
> Issue Type: New Feature
> Components: Write Client
> Reporter: vinoyang
> Assignee: vinoyang
> Priority: Major
>
> Apache Flink is a popular streaming processing engine.
> Integrating Hudi with Flink is a valuable work.
> The discussion mailing thread is here:
> [https://lists.apache.org/api/source.lua/1533de2d4cd4243fa9e8f8bf057ffd02f2ac0bec7c7539d8f72166ea@%3Cdev.hudi.apache.org%3E]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)