[flink] branch release-1.11 updated: [FLINK-17836][hive][doc] Add document for Hive dim join

lzljs3620320 Sun, 14 Jun 2020 18:44:15 -0700

This is an automated email from the ASF dual-hosted git repository.

lzljs3620320 pushed a commit to branch release-1.11
in repository https://gitbox.apache.org/repos/asf/flink.git



The following commit(s) were added to refs/heads/release-1.11 by this push:
     new cc6fb34  [FLINK-17836][hive][doc] Add document for Hive dim join
cc6fb34 is described below

commit cc6fb346cd72fd7da584d1aaef417a4266aa631e
Author: Rui Li <[email protected]>
AuthorDate: Mon Jun 15 09:34:33 2020 +0800

    [FLINK-17836][hive][doc] Add document for Hive dim join
    
    This closes #12609
---
 docs/dev/table/hive/hive_streaming.md    | 36 +++++++++++++++++++++++++++++++-
 docs/dev/table/hive/hive_streaming.zh.md | 36 +++++++++++++++++++++++++++++++-
 2 files changed, 70 insertions(+), 2 deletions(-)

diff --git a/docs/dev/table/hive/hive_streaming.md 
b/docs/dev/table/hive/hive_streaming.md
index 7b9f268..bcde777 100644
--- a/docs/dev/table/hive/hive_streaming.md
+++ b/docs/dev/table/hive/hive_streaming.md
@@ -163,4 +163,38 @@ SELECT * FROM hive_table /*+ 
OPTIONS('streaming-source.enable'='true', 'streamin
 
 ## Hive Table As Temporal Tables
 
-TODO
+You can use a Hive table as temporal table and join streaming data with it. 
Please follow
+the [example]({{ site.baseurl 
}}/dev/table/streaming/temporal_tables.html#temporal-table) to find out how to 
join a
+temporal table.
+
+When performing the join, the Hive table will be cached in TM memory and each 
record from the stream
+is looked up in the Hive table to decide whether a match is found. You don't 
need any extra settings to use a Hive table
+as temporal table. But optionally, you can configure the TTL of the Hive table 
cache with the following
+property. After the cache expires, the Hive table will be scanned again to 
load the latest data.
+
+<table class="table table-bordered">
+  <thead>
+    <tr>
+        <th class="text-left" style="width: 20%">Key</th>
+        <th class="text-left" style="width: 15%">Default</th>
+        <th class="text-left" style="width: 10%">Type</th>
+        <th class="text-left" style="width: 55%">Description</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+        <td><h5>lookup.join.cache.ttl</h5></td>
+        <td style="word-wrap: break-word;">60 min</td>
+        <td>Duration</td>
+        <td>The cache TTL (e.g. 10min) for the build table in lookup join. By 
default the TTL is 60 minutes.</td>
+    </tr>
+  </tbody>
+</table>
+
+**Note**:
+1. Each joining subtask needs to keep its own cache of the Hive table. Please 
make sure the Hive table can fit into
+the memory of a TM task slot.
+2. You should set a relatively large value for `lookup.join.cache.ttl`. You'll 
probably have performance issue if
+your Hive table needs to be updated and reloaded too frequently.
+3. Currently we simply load the whole Hive table whenever the cache needs 
refreshing. There's no way to differentiate
+new data from the old.
diff --git a/docs/dev/table/hive/hive_streaming.zh.md 
b/docs/dev/table/hive/hive_streaming.zh.md
index 7b9f268..42b1acf 100644
--- a/docs/dev/table/hive/hive_streaming.zh.md
+++ b/docs/dev/table/hive/hive_streaming.zh.md
@@ -163,4 +163,38 @@ SELECT * FROM hive_table /*+ 
OPTIONS('streaming-source.enable'='true', 'streamin
 
 ## Hive Table As Temporal Tables
 
-TODO
+You can use a Hive table as temporal table and join streaming data with it. 
Please follow
+the [example]({{ site.baseurl 
}}/zh/dev/table/streaming/temporal_tables.html#temporal-table) to find out how 
to join a
+temporal table.
+
+When performing the join, the Hive table will be cached in TM memory and each 
record from the stream
+is looked up in the Hive table to decide whether a match is found. You don't 
need any extra settings to use a Hive table
+as temporal table. But optionally, you can configure the TTL of the Hive table 
cache with the following
+property. After the cache expires, the Hive table will be scanned again to 
load the latest data.
+
+<table class="table table-bordered">
+  <thead>
+    <tr>
+        <th class="text-left" style="width: 20%">Key</th>
+        <th class="text-left" style="width: 15%">Default</th>
+        <th class="text-left" style="width: 10%">Type</th>
+        <th class="text-left" style="width: 55%">Description</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+        <td><h5>lookup.join.cache.ttl</h5></td>
+        <td style="word-wrap: break-word;">60 min</td>
+        <td>Duration</td>
+        <td>The cache TTL (e.g. 10min) for the build table in lookup join. By 
default the TTL is 60 minutes.</td>
+    </tr>
+  </tbody>
+</table>
+
+**Note**:
+1. Each joining subtask needs to keep its own cache of the Hive table. Please 
make sure the Hive table can fit into
+the memory of a TM task slot.
+2. You should set a relatively large value for `lookup.join.cache.ttl`. You'll 
probably have performance issue if
+your Hive table needs to be updated and reloaded too frequently.
+3. Currently we simply load the whole Hive table whenever the cache needs 
refreshing. There's no way to differentiate
+new data from the old.

[flink] branch release-1.11 updated: [FLINK-17836][hive][doc] Add document for Hive dim join

Reply via email to