This is an automated email from the ASF dual-hosted git repository. jark pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/flink.git
commit 73a3c7da689c6bccf842508d4f8da487b8d8cc5d Author: Jark Wu <[email protected]> AuthorDate: Thu Jul 7 20:31:04 2022 +0800 [FLINK-27244][hive] Improve documentation of reading partition with subdirectories for Hive tables --- .../docs/connectors/table/hive/hive_read_write.md | 16 +++++++++++----- .../docs/connectors/table/hive/hive_read_write.md | 16 +++++++++++----- 2 files changed, 22 insertions(+), 10 deletions(-) diff --git a/docs/content.zh/docs/connectors/table/hive/hive_read_write.md b/docs/content.zh/docs/connectors/table/hive/hive_read_write.md index ba712cc82e5..fd5321dd3f0 100644 --- a/docs/content.zh/docs/connectors/table/hive/hive_read_write.md +++ b/docs/content.zh/docs/connectors/table/hive/hive_read_write.md @@ -173,7 +173,7 @@ Multi-thread is used to split hive's partitions. You can use `table.exec.hive.lo ### Read Partition With Subdirectory In some case, you may create an external table referring another table, but the partition columns is a subset of the referred table. -For example, you have a partitioned table `fact_tz` with partition `day`/`hour`: +For example, you have a partitioned table `fact_tz` with partition `day` and `hour`: ```sql CREATE TABLE fact_tz(x int) PARTITIONED BY (day STRING, hour STRING); @@ -182,13 +182,19 @@ CREATE TABLE fact_tz(x int) PARTITIONED BY (day STRING, hour STRING); And you have an external table `fact_daily` referring to table `fact_tz` with a coarse-grained partition `day`: ```sql -create external table fact_daily(x int) PARTITIONED BY (ds STRING) location 'fact_tz_localtion' ; +CREATE EXTERNAL TABLE fact_daily(x int) PARTITIONED BY (ds STRING) LOCATION '/path/to/fact_tz'; ``` -Then when reading the external table, there will be sub-directories in the partition directory of the external table. +Then when reading the external table `fact_daily`, there will be sub-directories (`hour=1` to `hour=24`) in the partition directory of the table. -You can configure `table.exec.hive.read-partition-with-subdirectory.enabled` to allow Flink to read the sub-directories or skip them directly. -The default value is true, it will read the sub-directories. Otherwise, it will throw the exception "not a file: xxx" when the partition directory contains any sub-directory. +By default, you can add partition with sub-directories to the external table. Flink SQL can recursively scan all sub-directories and fetch all the data from all sub-directories. + +```sql +ALTER TABLE fact_daily ADD PARTITION (ds='2022-07-07') location '/path/to/fact_tz/ds=2022-07-07'; +``` + +You can set job configuration `table.exec.hive.read-partition-with-subdirectory.enabled` (`true` by default) to `false` to disallow Flink to read the sub-directories. +If the configuration is `false` and the directory does not contain files, rather consists of sub directories Flink blows up with the exception: `java.io.IOException: Not a file: /path/to/data/*`. ## Temporal Table Join diff --git a/docs/content/docs/connectors/table/hive/hive_read_write.md b/docs/content/docs/connectors/table/hive/hive_read_write.md index 3c5f7cd043a..394551114ec 100644 --- a/docs/content/docs/connectors/table/hive/hive_read_write.md +++ b/docs/content/docs/connectors/table/hive/hive_read_write.md @@ -173,7 +173,7 @@ Multi-thread is used to split hive's partitions. You can use `table.exec.hive.lo ### Read Partition With Subdirectory In some case, you may create an external table referring another table, but the partition columns is a subset of the referred table. -For example, you have a partitioned table `fact_tz` with partition `day`/`hour`: +For example, you have a partitioned table `fact_tz` with partition `day` and `hour`: ```sql CREATE TABLE fact_tz(x int) PARTITIONED BY (day STRING, hour STRING); @@ -182,13 +182,19 @@ CREATE TABLE fact_tz(x int) PARTITIONED BY (day STRING, hour STRING); And you have an external table `fact_daily` referring to table `fact_tz` with a coarse-grained partition `day`: ```sql -create external table fact_daily(x int) PARTITIONED BY (ds STRING) location 'fact_tz_localtion' ; +CREATE EXTERNAL TABLE fact_daily(x int) PARTITIONED BY (ds STRING) LOCATION '/path/to/fact_tz'; ``` -Then when reading the external table `fact_daily`, there will be sub-directories in the partition directory of the table. +Then when reading the external table `fact_daily`, there will be sub-directories (`hour=1` to `hour=24`) in the partition directory of the table. -You can configure `table.exec.hive.read-partition-with-subdirectory.enabled` to allow Flink to read the sub-directories or skip them directly. -The default value is true, it will read the sub-directories. Otherwise, it will throw the exception "not a file: xxx" when the partition directory contains any sub-directory. +By default, you can add partition with sub-directories to the external table. Flink SQL can recursively scan all sub-directories and fetch all the data from all sub-directories. + +```sql +ALTER TABLE fact_daily ADD PARTITION (ds='2022-07-07') location '/path/to/fact_tz/ds=2022-07-07'; +``` + +You can set job configuration `table.exec.hive.read-partition-with-subdirectory.enabled` (`true` by default) to `false` to disallow Flink to read the sub-directories. +If the configuration is `false` and the directory does not contain files, rather consists of sub directories Flink blows up with the exception: `java.io.IOException: Not a file: /path/to/data/*`. ## Temporal Table Join
