[ 
https://issues.apache.org/jira/browse/FLINK-18825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flink Jira Bot updated FLINK-18825:
-----------------------------------
    Labels: stale-assigned  (was: )

> Support to process CDC message in batch mode
> --------------------------------------------
>
>                 Key: FLINK-18825
>                 URL: https://issues.apache.org/jira/browse/FLINK-18825
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile), Table 
> SQL / Ecosystem
>            Reporter: Jark Wu
>            Assignee: Nicholas Jiang
>            Priority: Major
>              Labels: stale-assigned
>
> Currently, processing CDC is only supported for streaming mode. However, 
> there is also cases to process CDC data in batch mode. For example, load, 
> materialize and compress CDC data into Hive using parquet format. 
> Interpreting changelog in batch mode can be supported by adding a 
> MaterializeOperator after the CDC source operator. The MaterializeOperator  
> acts like a HashAggregate and SortAggregate based on whether all the fields 
> are fixed-length. It will consume all the input RowDatas and replace/remove 
> rows depending on the flag in the header. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to