[ 
https://issues.apache.org/jira/browse/FLINK-20239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17235407#comment-17235407
 ] 

Rui Li commented on FLINK-20239:
--------------------------------

Let me try to list the main differences between batch and streaming.

* Batch Read. The hive table is scanned and consumed once. It's the same way 
how a table would be read in hive world.
* Batch Write. Data written to a hive table becomes visible to hive when the 
job finishes. Again this is the same as how hive writes a table itself. Only 
works with batch source.
* Streaming Read. A monitor periodically checks for new files/partitions in a 
hive table, and fetches data incrementally when new files/partitions are added 
to the table.
* Streaming Write. Data written to a hive table is committed (becomes visible 
to hive) incrementally. Users can control when/how to trigger commit with 
several properties. Works with both batch and streaming source. Insert 
overwrite is not supported in streaming write.

> Confusing pages: "Hive Read & Write" and "Hive Streaming"
> ---------------------------------------------------------
>
>                 Key: FLINK-20239
>                 URL: https://issues.apache.org/jira/browse/FLINK-20239
>             Project: Flink
>          Issue Type: Improvement
>          Components: Connectors / Hive, Documentation
>    Affects Versions: 1.12.0
>            Reporter: Dawid Wysakowicz
>            Priority: Critical
>             Fix For: 1.12.0
>
>
> The two pages describe how to read & write from Hive. It is not very clear 
> what is the relation between the two pages. Moreover the {{Hive Streaming}} 
> is way more comprehensive.
> Personally I found the {{Hive Read & Write}} page not helpful and bloated 
> with irrelevant sections such as e.g. Formats, Limit pushdown which often 
> contain a single sentence.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to