[flink] 02/02: [FLINK-27244][hive] Improve documentation of reading partition with subdirectories for Hive tables

jark Thu, 07 Jul 2022 06:03:54 -0700

This is an automated email from the ASF dual-hosted git repository.

jark pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/flink.git


commit 73a3c7da689c6bccf842508d4f8da487b8d8cc5d
Author: Jark Wu <[email protected]>
AuthorDate: Thu Jul 7 20:31:04 2022 +0800

    [FLINK-27244][hive] Improve documentation of reading partition with 
subdirectories for Hive tables
---
 .../docs/connectors/table/hive/hive_read_write.md        | 16 +++++++++++-----
 .../docs/connectors/table/hive/hive_read_write.md        | 16 +++++++++++-----
 2 files changed, 22 insertions(+), 10 deletions(-)

diff --git a/docs/content.zh/docs/connectors/table/hive/hive_read_write.md 
b/docs/content.zh/docs/connectors/table/hive/hive_read_write.md
index ba712cc82e5..fd5321dd3f0 100644
--- a/docs/content.zh/docs/connectors/table/hive/hive_read_write.md
+++ b/docs/content.zh/docs/connectors/table/hive/hive_read_write.md
@@ -173,7 +173,7 @@ Multi-thread is used to split hive's partitions. You can 
use `table.exec.hive.lo
 ### Read Partition With Subdirectory
 
 In some case, you may create an external table referring another table, but 
the partition columns is a subset of the referred table.
-For example, you have a partitioned table `fact_tz` with partition 
`day`/`hour`:
+For example, you have a partitioned table `fact_tz` with partition `day` and 
`hour`:
 
 ```sql
 CREATE TABLE fact_tz(x int) PARTITIONED BY (day STRING, hour STRING);
@@ -182,13 +182,19 @@ CREATE TABLE fact_tz(x int) PARTITIONED BY (day STRING, 
hour STRING);
 And you have an external table `fact_daily` referring to table `fact_tz` with 
a coarse-grained partition `day`:
 
 ```sql
-create external table fact_daily(x int) PARTITIONED BY (ds STRING) location 
'fact_tz_localtion' ;
+CREATE EXTERNAL TABLE fact_daily(x int) PARTITIONED BY (ds STRING) LOCATION 
'/path/to/fact_tz';
 ```
 
-Then when reading the external table, there will be sub-directories in the 
partition directory of the external table.
+Then when reading the external table `fact_daily`, there will be 
sub-directories (`hour=1` to `hour=24`) in the partition directory of the table.
 
-You can configure `table.exec.hive.read-partition-with-subdirectory.enabled` 
to allow Flink to read the sub-directories or skip them directly.
-The default value is true, it will read the sub-directories. Otherwise, it 
will throw the exception "not a file: xxx" when the partition directory 
contains any sub-directory.
+By default, you can add partition with sub-directories to the external table. 
Flink SQL can recursively scan all sub-directories and fetch all the data from 
all sub-directories.
+
+```sql
+ALTER TABLE fact_daily ADD PARTITION (ds='2022-07-07') location 
'/path/to/fact_tz/ds=2022-07-07';
+```
+
+You can set job configuration 
`table.exec.hive.read-partition-with-subdirectory.enabled` (`true` by default) 
to `false` to disallow Flink to read the sub-directories.
+If the configuration is `false` and the directory does not contain files, 
rather consists of sub directories Flink blows up with the exception: 
`java.io.IOException: Not a file: /path/to/data/*`.
 
 ## Temporal Table Join
 
diff --git a/docs/content/docs/connectors/table/hive/hive_read_write.md 
b/docs/content/docs/connectors/table/hive/hive_read_write.md
index 3c5f7cd043a..394551114ec 100644
--- a/docs/content/docs/connectors/table/hive/hive_read_write.md
+++ b/docs/content/docs/connectors/table/hive/hive_read_write.md
@@ -173,7 +173,7 @@ Multi-thread is used to split hive's partitions. You can 
use `table.exec.hive.lo
 ### Read Partition With Subdirectory
 
 In some case, you may create an external table referring another table, but 
the partition columns is a subset of the referred table.
-For example, you have a partitioned table `fact_tz` with partition 
`day`/`hour`:
+For example, you have a partitioned table `fact_tz` with partition `day` and 
`hour`:
 
 ```sql
 CREATE TABLE fact_tz(x int) PARTITIONED BY (day STRING, hour STRING);
@@ -182,13 +182,19 @@ CREATE TABLE fact_tz(x int) PARTITIONED BY (day STRING, 
hour STRING);
 And you have an external table `fact_daily` referring to table `fact_tz` with 
a coarse-grained partition `day`:
 
 ```sql
-create external table fact_daily(x int) PARTITIONED BY (ds STRING) location 
'fact_tz_localtion' ;
+CREATE EXTERNAL TABLE fact_daily(x int) PARTITIONED BY (ds STRING) LOCATION 
'/path/to/fact_tz';
 ```
 
-Then when reading the external table `fact_daily`, there will be 
sub-directories in the partition directory of the table.
+Then when reading the external table `fact_daily`, there will be 
sub-directories (`hour=1` to `hour=24`) in the partition directory of the table.
 
-You can configure `table.exec.hive.read-partition-with-subdirectory.enabled` 
to allow Flink to read the sub-directories or skip them directly.
-The default value is true, it will read the sub-directories. Otherwise, it 
will throw the exception "not a file: xxx" when the partition directory 
contains any sub-directory.
+By default, you can add partition with sub-directories to the external table. 
Flink SQL can recursively scan all sub-directories and fetch all the data from 
all sub-directories.
+
+```sql
+ALTER TABLE fact_daily ADD PARTITION (ds='2022-07-07') location 
'/path/to/fact_tz/ds=2022-07-07';
+```
+
+You can set job configuration 
`table.exec.hive.read-partition-with-subdirectory.enabled` (`true` by default) 
to `false` to disallow Flink to read the sub-directories.
+If the configuration is `false` and the directory does not contain files, 
rather consists of sub directories Flink blows up with the exception: 
`java.io.IOException: Not a file: /path/to/data/*`.
 
 ## Temporal Table Join

[flink] 02/02: [FLINK-27244][hive] Improve documentation of reading partition with subdirectories for Hive tables

Reply via email to