[ https://issues.apache.org/jira/browse/FLINK-20239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17235407#comment-17235407 ]
Rui Li commented on FLINK-20239: -------------------------------- Let me try to list the main differences between batch and streaming. * Batch Read. The hive table is scanned and consumed once. It's the same way how a table would be read in hive world. * Batch Write. Data written to a hive table becomes visible to hive when the job finishes. Again this is the same as how hive writes a table itself. Only works with batch source. * Streaming Read. A monitor periodically checks for new files/partitions in a hive table, and fetches data incrementally when new files/partitions are added to the table. * Streaming Write. Data written to a hive table is committed (becomes visible to hive) incrementally. Users can control when/how to trigger commit with several properties. Works with both batch and streaming source. Insert overwrite is not supported in streaming write. > Confusing pages: "Hive Read & Write" and "Hive Streaming" > --------------------------------------------------------- > > Key: FLINK-20239 > URL: https://issues.apache.org/jira/browse/FLINK-20239 > Project: Flink > Issue Type: Improvement > Components: Connectors / Hive, Documentation > Affects Versions: 1.12.0 > Reporter: Dawid Wysakowicz > Priority: Critical > Fix For: 1.12.0 > > > The two pages describe how to read & write from Hive. It is not very clear > what is the relation between the two pages. Moreover the {{Hive Streaming}} > is way more comprehensive. > Personally I found the {{Hive Read & Write}} page not helpful and bloated > with irrelevant sections such as e.g. Formats, Limit pushdown which often > contain a single sentence. -- This message was sent by Atlassian Jira (v8.3.4#803005)