[jira] [Commented] (HUDI-184) Integrate Hudi with Apache Flink

vinoyang (Jira) Mon, 25 Nov 2019 07:49:19 -0800


    [ 
https://issues.apache.org/jira/browse/HUDI-184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16981649#comment-16981649
 ]


vinoyang commented on HUDI-184:
-------------------------------

[~vinoth] Based on limited observation, answer your two questions:

 
bq. We need to decide when to commit a batch of record i.e pause streaming 
across workers and publish to Hudi timeline. In a purely streaming model can 
this be achieved?
 
Yes, as discussed before, we will use Flink's window mechanism as bounded 
stream abstraction. A window is a batch, when firing the window, we can 
implement {{ProcessWindowFunction}} 
(https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/windows.html#processwindowfunction)
 which can let us access all the elements cache in the window. We can implement 
the commit business logic in this UDF.

bq. How do we run compaction? Can a physical Flink/YARN job for e.g run both 
ingestion and compaction concurrently, as we can do with Spark/DeltaStreamer 
continuous mode now? 

Yes, It seems the ingestion and compaction steps are independent of each other? 
We just let them exist in the same Spark job? If so, it's also not a problem in 
Flink.

> Integrate Hudi with Apache Flink
> --------------------------------
>
>                 Key: HUDI-184
>                 URL: https://issues.apache.org/jira/browse/HUDI-184
>             Project: Apache Hudi (incubating)
>          Issue Type: New Feature
>          Components: Write Client
>            Reporter: vinoyang
>            Assignee: vinoyang
>            Priority: Major
>
> Apache Flink is a popular streaming processing engine.
> Integrating Hudi with Flink is a valuable work.
> The discussion mailing thread is here: 
> [https://lists.apache.org/api/source.lua/1533de2d4cd4243fa9e8f8bf057ffd02f2ac0bec7c7539d8f72166ea@%3Cdev.hudi.apache.org%3E]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-184) Integrate Hudi with Apache Flink

Reply via email to