This is an automated email from the ASF dual-hosted git repository. leesf pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git
The following commit(s) were added to refs/heads/asf-site by this push: new 49047de [HUDI-611] Add Impala Guide to Doc (#1349) 49047de is described below commit 49047deb43bc400ccc0fa6b2d02f2fe379d1c8a4 Author: YanJia-Gary-Li <yanjia.gary...@gmail.com> AuthorDate: Sat Feb 22 18:19:25 2020 -0800 [HUDI-611] Add Impala Guide to Doc (#1349) --- docs/_docs/2_3_querying_data.cn.md | 30 ++++++++++++++++++++++++++++++ docs/_docs/2_3_querying_data.md | 29 +++++++++++++++++++++++++++++ 2 files changed, 59 insertions(+) diff --git a/docs/_docs/2_3_querying_data.cn.md b/docs/_docs/2_3_querying_data.cn.md index 81f2273..b2c4870 100644 --- a/docs/_docs/2_3_querying_data.cn.md +++ b/docs/_docs/2_3_querying_data.cn.md @@ -145,3 +145,33 @@ scala> sqlContext.sql("select count(*) from hudi_rt where datestr = '2016-10-02' Presto是一种常用的查询引擎,可提供交互式查询性能。 Hudi RO表可以在Presto中无缝查询。 这需要在整个安装过程中将`hudi-presto-bundle` jar放入`<presto_install>/plugin/hive-hadoop2/`中。 + +## Impala(此功能还未正式发布) + +### 读优化表 + +Impala可以在HDFS上查询Hudi读优化表,作为一种 [EXTERNAL TABLE](https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/impala_tables.html#external_tables) 的形式。 +可以通过以下方式在Impala上建立Hudi读优化表: +``` +CREATE EXTERNAL TABLE database.table_name +LIKE PARQUET '/path/to/load/xxx.parquet' +STORED AS HUDIPARQUET +LOCATION '/path/to/load'; +``` +Impala可以利用合理的文件分区来提高查询的效率。 +如果想要建立分区的表,文件夹命名需要根据此种方式`year=2020/month=1`. +Impala使用`=`来区分分区名和分区值. +可以通过以下方式在Impala上建立分区Hudi读优化表: +``` +CREATE EXTERNAL TABLE database.table_name +LIKE PARQUET '/path/to/load/xxx.parquet' +PARTITION BY (year int, month int, day int) +STORED AS HUDIPARQUET +LOCATION '/path/to/load'; +ALTER TABLE database.table_name RECOVER PARTITIONS; +``` +在Hudi成功写入一个新的提交后, 刷新Impala表来得到最新的结果. +``` +REFRESH database.table_name +``` + diff --git a/docs/_docs/2_3_querying_data.md b/docs/_docs/2_3_querying_data.md index 2d97e2b..0ee5e17 100644 --- a/docs/_docs/2_3_querying_data.md +++ b/docs/_docs/2_3_querying_data.md @@ -150,3 +150,32 @@ Additionally, `HoodieReadClient` offers the following functionality using Hudi's Presto is a popular query engine, providing interactive query performance. Presto currently supports only read optimized queries on Hudi tables. This requires the `hudi-presto-bundle` jar to be placed into `<presto_install>/plugin/hive-hadoop2/`, across the installation. + +## Impala(Not Officially Released) + +### Read optimized table + +Impala is able to query Hudi read optimized table as an [EXTERNAL TABLE](https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/impala_tables.html#external_tables) on HDFS. +To create a Hudi read optimized table on Impala: +``` +CREATE EXTERNAL TABLE database.table_name +LIKE PARQUET '/path/to/load/xxx.parquet' +STORED AS HUDIPARQUET +LOCATION '/path/to/load'; +``` +Impala is able to take advantage of the physical partition structure to improve the query performance. +To create a partitioned table, the folder should follow the naming convention like `year=2020/month=1`. +Impala use `=` to separate partition name and partition value. +To create a partitioned Hudi read optimized table on Impala: +``` +CREATE EXTERNAL TABLE database.table_name +LIKE PARQUET '/path/to/load/xxx.parquet' +PARTITION BY (year int, month int, day int) +STORED AS HUDIPARQUET +LOCATION '/path/to/load'; +ALTER TABLE database.table_name RECOVER PARTITIONS; +``` +After Hudi made a new commit, refresh the Impala table to get the latest results. +``` +REFRESH database.table_name +```