This is an automated email from the ASF dual-hosted git repository.
leesf pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git
The following commit(s) were added to refs/heads/asf-site by this push:
new 18ce570 [MINOR] Update doc to include inc query on partitions (#1454)
18ce570 is described below
commit 18ce5708e073e80779f6dcc00d388b4cb0cc758a
Author: YanJia-Gary-Li <[email protected]>
AuthorDate: Sat Mar 28 20:28:48 2020 -0700
[MINOR] Update doc to include inc query on partitions (#1454)
---
docs/_docs/0.5.2/2_3_querying_data.cn.md | 31 ++++++++++++++++++++++++++++++-
docs/_docs/0.5.2/2_3_querying_data.md | 3 ++-
docs/_docs/2_3_querying_data.cn.md | 31 ++++++++++++++++++++++++++++++-
docs/_docs/2_3_querying_data.md | 3 ++-
4 files changed, 64 insertions(+), 4 deletions(-)
diff --git a/docs/_docs/0.5.2/2_3_querying_data.cn.md
b/docs/_docs/0.5.2/2_3_querying_data.cn.md
index 74afcef..77ad2d7 100644
--- a/docs/_docs/0.5.2/2_3_querying_data.cn.md
+++ b/docs/_docs/0.5.2/2_3_querying_data.cn.md
@@ -25,6 +25,33 @@ language: cn
并与其他表(数据集/维度)结合以[写出增量](/cn/docs/0.5.2-writing_data.html)到目标Hudi数据集。增量视图是通过查询上表之一实现的,并具有特殊配置,
该特殊配置指示查询计划仅需要从数据集中获取增量数据。
+
+## 查询引擎支持列表
+
+下面的表格展示了各查询引擎是否支持Hudi格式
+
+### 读优化表
+
+|查询引擎|实时视图|增量拉取|
+|------------|--------|-----------|
+|**Hive**|Y|Y|
+|**Spark SQL**|Y|Y|
+|**Spark Datasource**|Y|Y|
+|**Presto**|Y|N|
+|**Impala**|Y|N|
+
+
+### 实时表
+
+|查询引擎|实时视图|增量拉取|读优化表|
+|------------|--------|-----------|--------------|
+|**Hive**|Y|Y|Y|
+|**Spark SQL**|Y|Y|Y|
+|**Spark Datasource**|N|N|Y|
+|**Presto**|N|N|Y|
+|**Impala**|N|N|Y|
+
+
接下来,我们将详细讨论在每个查询引擎上如何访问所有三个视图。
## Hive
@@ -128,7 +155,9 @@ scala> sqlContext.sql("select count(*) from hudi_rt where
datestr = '2016-10-02'
DataSourceReadOptions.VIEW_TYPE_INCREMENTAL_OPT_VAL())
.option(DataSourceReadOptions.BEGIN_INSTANTTIME_OPT_KEY(),
<beginInstantTime>)
- .load(tablePath); // For incremental view, pass in the root/base path of
dataset
+ .option(DataSourceReadOptions.INCR_PATH_GLOB_OPT_KEY(),
+ "/year=2020/month=*/day=*") // 可选,从指定的分区增量拉取
+ .load(tablePath); // 用数据集的最底层路径
```
请参阅[设置](/cn/docs/0.5.2-configurations.html#spark-datasource)部分,以查看所有数据源选项。
diff --git a/docs/_docs/0.5.2/2_3_querying_data.md
b/docs/_docs/0.5.2/2_3_querying_data.md
index 0c28b12..9d17e72 100644
--- a/docs/_docs/0.5.2/2_3_querying_data.md
+++ b/docs/_docs/0.5.2/2_3_querying_data.md
@@ -55,7 +55,7 @@ Note that `Read Optimized` queries are not applicable for
COPY_ON_WRITE tables.
|**Spark SQL**|Y|Y|Y|
|**Spark Datasource**|N|N|Y|
|**Presto**|N|N|Y|
-|**Impala**|N|N|N|
+|**Impala**|N|N|Y|
In sections, below we will discuss specific setup to access different query
types from different query engines.
@@ -148,6 +148,7 @@ The following snippet shows how to obtain all records
changed after `beginInstan
.format("org.apache.hudi")
.option(DataSourceReadOptions.QUERY_TYPE_OPT_KEY(),
DataSourceReadOptions.QUERY_TYPE_INCREMENTAL_OPT_VAL())
.option(DataSourceReadOptions.BEGIN_INSTANTTIME_OPT_KEY(),
<beginInstantTime>)
+ .option(DataSourceReadOptions.INCR_PATH_GLOB_OPT_KEY(),
"/year=2020/month=*/day=*") // Optional, use glob pattern if querying certain
partitions
.load(tablePath); // For incremental query, pass in the root/base path of
table
hudiIncQueryDF.createOrReplaceTempView("hudi_trips_incremental")
diff --git a/docs/_docs/2_3_querying_data.cn.md
b/docs/_docs/2_3_querying_data.cn.md
index b2c4870..1fa91d1 100644
--- a/docs/_docs/2_3_querying_data.cn.md
+++ b/docs/_docs/2_3_querying_data.cn.md
@@ -24,6 +24,33 @@ language: cn
并与其他表(数据集/维度)结合以[写出增量](/cn/docs/writing_data.html)到目标Hudi数据集。增量视图是通过查询上表之一实现的,并具有特殊配置,
该特殊配置指示查询计划仅需要从数据集中获取增量数据。
+
+## 查询引擎支持列表
+
+下面的表格展示了各查询引擎是否支持Hudi格式
+
+### 读优化表
+
+|查询引擎|实时视图|增量拉取|
+|------------|--------|-----------|
+|**Hive**|Y|Y|
+|**Spark SQL**|Y|Y|
+|**Spark Datasource**|Y|Y|
+|**Presto**|Y|N|
+|**Impala**|Y|N|
+
+
+### 实时表
+
+|查询引擎|实时视图|增量拉取|读优化表|
+|------------|--------|-----------|--------------|
+|**Hive**|Y|Y|Y|
+|**Spark SQL**|Y|Y|Y|
+|**Spark Datasource**|N|N|Y|
+|**Presto**|N|N|Y|
+|**Impala**|N|N|Y|
+
+
接下来,我们将详细讨论在每个查询引擎上如何访问所有三个视图。
## Hive
@@ -127,7 +154,9 @@ scala> sqlContext.sql("select count(*) from hudi_rt where
datestr = '2016-10-02'
DataSourceReadOptions.VIEW_TYPE_INCREMENTAL_OPT_VAL())
.option(DataSourceReadOptions.BEGIN_INSTANTTIME_OPT_KEY(),
<beginInstantTime>)
- .load(tablePath); // For incremental view, pass in the root/base path of
dataset
+ .option(DataSourceReadOptions.INCR_PATH_GLOB_OPT_KEY(),
+ "/year=2020/month=*/day=*") // 可选,从指定的分区增量拉取
+ .load(tablePath); // 用数据集的最底层路径
```
请参阅[设置](/cn/docs/configurations.html#spark-datasource)部分,以查看所有数据源选项。
diff --git a/docs/_docs/2_3_querying_data.md b/docs/_docs/2_3_querying_data.md
index 875b7f0..3e6a436 100644
--- a/docs/_docs/2_3_querying_data.md
+++ b/docs/_docs/2_3_querying_data.md
@@ -54,7 +54,7 @@ Note that `Read Optimized` queries are not applicable for
COPY_ON_WRITE tables.
|**Spark SQL**|Y|Y|Y|
|**Spark Datasource**|N|N|Y|
|**Presto**|N|N|Y|
-|**Impala**|N|N|N|
+|**Impala**|N|N|Y|
In sections, below we will discuss specific setup to access different query
types from different query engines.
@@ -147,6 +147,7 @@ The following snippet shows how to obtain all records
changed after `beginInstan
.format("org.apache.hudi")
.option(DataSourceReadOptions.QUERY_TYPE_OPT_KEY(),
DataSourceReadOptions.QUERY_TYPE_INCREMENTAL_OPT_VAL())
.option(DataSourceReadOptions.BEGIN_INSTANTTIME_OPT_KEY(),
<beginInstantTime>)
+ .option(DataSourceReadOptions.INCR_PATH_GLOB_OPT_KEY(),
"/year=2020/month=*/day=*") // Optional, use glob pattern if querying certain
partitions
.load(tablePath); // For incremental query, pass in the root/base path of
table
hudiIncQueryDF.createOrReplaceTempView("hudi_trips_incremental")