[
https://issues.apache.org/jira/browse/FLINK-20239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17235407#comment-17235407
]
Rui Li commented on FLINK-20239:
--------------------------------
Let me try to list the main differences between batch and streaming.
* Batch Read. The hive table is scanned and consumed once. It's the same way
how a table would be read in hive world.
* Batch Write. Data written to a hive table becomes visible to hive when the
job finishes. Again this is the same as how hive writes a table itself. Only
works with batch source.
* Streaming Read. A monitor periodically checks for new files/partitions in a
hive table, and fetches data incrementally when new files/partitions are added
to the table.
* Streaming Write. Data written to a hive table is committed (becomes visible
to hive) incrementally. Users can control when/how to trigger commit with
several properties. Works with both batch and streaming source. Insert
overwrite is not supported in streaming write.
> Confusing pages: "Hive Read & Write" and "Hive Streaming"
> ---------------------------------------------------------
>
> Key: FLINK-20239
> URL: https://issues.apache.org/jira/browse/FLINK-20239
> Project: Flink
> Issue Type: Improvement
> Components: Connectors / Hive, Documentation
> Affects Versions: 1.12.0
> Reporter: Dawid Wysakowicz
> Priority: Critical
> Fix For: 1.12.0
>
>
> The two pages describe how to read & write from Hive. It is not very clear
> what is the relation between the two pages. Moreover the {{Hive Streaming}}
> is way more comprehensive.
> Personally I found the {{Hive Read & Write}} page not helpful and bloated
> with irrelevant sections such as e.g. Formats, Limit pushdown which often
> contain a single sentence.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)