[incubator-hudi] branch asf-site updated: [HUDI-611] Add Impala Guide to Doc (#1349)

leesf Sat, 22 Feb 2020 18:20:10 -0800

This is an automated email from the ASF dual-hosted git repository.

leesf pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git



The following commit(s) were added to refs/heads/asf-site by this push:
     new 49047de  [HUDI-611] Add Impala Guide to Doc (#1349)
49047de is described below

commit 49047deb43bc400ccc0fa6b2d02f2fe379d1c8a4
Author: YanJia-Gary-Li <yanjia.gary...@gmail.com>
AuthorDate: Sat Feb 22 18:19:25 2020 -0800

    [HUDI-611] Add Impala Guide to Doc (#1349)
---
 docs/_docs/2_3_querying_data.cn.md | 30 ++++++++++++++++++++++++++++++
 docs/_docs/2_3_querying_data.md    | 29 +++++++++++++++++++++++++++++
 2 files changed, 59 insertions(+)

diff --git a/docs/_docs/2_3_querying_data.cn.md 
b/docs/_docs/2_3_querying_data.cn.md
index 81f2273..b2c4870 100644
--- a/docs/_docs/2_3_querying_data.cn.md
+++ b/docs/_docs/2_3_querying_data.cn.md
@@ -145,3 +145,33 @@ scala> sqlContext.sql("select count(*) from hudi_rt where 
datestr = '2016-10-02'
 
 Presto是一种常用的查询引擎，可提供交互式查询性能。 Hudi RO表可以在Presto中无缝查询。
 这需要在整个安装过程中将`hudi-presto-bundle` jar放入`<presto_install>/plugin/hive-hadoop2/`中。
+
+## Impala(此功能还未正式发布)
+
+### 读优化表
+
+Impala可以在HDFS上查询Hudi读优化表，作为一种 [EXTERNAL 
TABLE](https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/impala_tables.html#external_tables)
 的形式。  
+可以通过以下方式在Impala上建立Hudi读优化表:
+```
+CREATE EXTERNAL TABLE database.table_name
+LIKE PARQUET '/path/to/load/xxx.parquet'
+STORED AS HUDIPARQUET
+LOCATION '/path/to/load';
+```
+Impala可以利用合理的文件分区来提高查询的效率。
+如果想要建立分区的表，文件夹命名需要根据此种方式`year=2020/month=1`.
+Impala使用`=`来区分分区名和分区值.  
+可以通过以下方式在Impala上建立分区Hudi读优化表:
+```
+CREATE EXTERNAL TABLE database.table_name
+LIKE PARQUET '/path/to/load/xxx.parquet'
+PARTITION BY (year int, month int, day int)
+STORED AS HUDIPARQUET
+LOCATION '/path/to/load';
+ALTER TABLE database.table_name RECOVER PARTITIONS;
+```
+在Hudi成功写入一个新的提交后, 刷新Impala表来得到最新的结果.
+```
+REFRESH database.table_name
+```
+
diff --git a/docs/_docs/2_3_querying_data.md b/docs/_docs/2_3_querying_data.md
index 2d97e2b..0ee5e17 100644
--- a/docs/_docs/2_3_querying_data.md
+++ b/docs/_docs/2_3_querying_data.md
@@ -150,3 +150,32 @@ Additionally, `HoodieReadClient` offers the following 
functionality using Hudi's
 
 Presto is a popular query engine, providing interactive query performance. 
Presto currently supports only read optimized queries on Hudi tables. 
 This requires the `hudi-presto-bundle` jar to be placed into 
`<presto_install>/plugin/hive-hadoop2/`, across the installation.
+
+## Impala(Not Officially Released)
+
+### Read optimized table
+
+Impala is able to query Hudi read optimized table as an [EXTERNAL 
TABLE](https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/impala_tables.html#external_tables)
 on HDFS.  
+To create a Hudi read optimized table on Impala:
+```
+CREATE EXTERNAL TABLE database.table_name
+LIKE PARQUET '/path/to/load/xxx.parquet'
+STORED AS HUDIPARQUET
+LOCATION '/path/to/load';
+```
+Impala is able to take advantage of the physical partition structure to 
improve the query performance.
+To create a partitioned table, the folder should follow the naming convention 
like `year=2020/month=1`.
+Impala use `=` to separate partition name and partition value.  
+To create a partitioned Hudi read optimized table on Impala:
+```
+CREATE EXTERNAL TABLE database.table_name
+LIKE PARQUET '/path/to/load/xxx.parquet'
+PARTITION BY (year int, month int, day int)
+STORED AS HUDIPARQUET
+LOCATION '/path/to/load';
+ALTER TABLE database.table_name RECOVER PARTITIONS;
+```
+After Hudi made a new commit, refresh the Impala table to get the latest 
results.
+```
+REFRESH database.table_name
+```

[incubator-hudi] branch asf-site updated: [HUDI-611] Add Impala Guide to Doc (#1349)

Reply via email to