http://git-wip-us.apache.org/repos/asf/tajo/blob/955a7bf8/tajo-docs/src/main/sphinx/configuration/catalog_configuration.rst ---------------------------------------------------------------------- diff --git a/tajo-docs/src/main/sphinx/configuration/catalog_configuration.rst b/tajo-docs/src/main/sphinx/configuration/catalog_configuration.rst index c629d9e..bb88720 100644 --- a/tajo-docs/src/main/sphinx/configuration/catalog_configuration.rst +++ b/tajo-docs/src/main/sphinx/configuration/catalog_configuration.rst @@ -7,21 +7,21 @@ If you want to customize the catalog service, copy ``$TAJO_HOME/conf/catalog-sit * tajo.catalog.master.addr - If you want to launch a Tajo cluster in distributed mode, you must specify this address. For more detail information, see [Default Ports](#DefaultPorts). * tajo.catalog.store.class - If you want to change the persistent storage of the catalog server, specify the class name. Its default value is tajo.catalog.store.DerbyStore. In the current version, Tajo provides three persistent storage classes as follows: -+-----------------------------------+------------------------------------------------+ -| Driver Class | Descriptions | -+===================================+================================================+ -| tajo.catalog.store.DerbyStore | this storage class uses Apache Derby. | -+-----------------------------------+------------------------------------------------+ -| tajo.catalog.store.MySQLStore | this storage class uses MySQL. | -+-----------------------------------+------------------------------------------------+ -| tajo.catalog.store.MariaDBStore | this storage class uses MariaDB. | -+-----------------------------------+------------------------------------------------+ -| tajo.catalog.store.MemStore | this is the in-memory storage. It is only used | -| | in unit tests to shorten the duration of unit | -| | tests. | -+-----------------------------------+------------------------------------------------+ -| tajo.catalog.store.HCatalogStore | this storage class uses HiveMetaStore. | -+-----------------------------------+------------------------------------------------+ ++--------------------------------------+------------------------------------------------+ +| Driver Class | Descriptions | ++======================================+================================================+ +| tajo.catalog.store.DerbyStore | this storage class uses Apache Derby. | ++--------------------------------------+------------------------------------------------+ +| tajo.catalog.store.MySQLStore | this storage class uses MySQL. | ++--------------------------------------+------------------------------------------------+ +| tajo.catalog.store.MariaDBStore | this storage class uses MariaDB. | ++--------------------------------------+------------------------------------------------+ +| tajo.catalog.store.MemStore | this is the in-memory storage. It is only used | +| | in unit tests to shorten the duration of unit | +| | tests. | ++--------------------------------------+------------------------------------------------+ +| tajo.catalog.store.HiveCatalogStore | this storage class uses HiveMetaStore. | ++--------------------------------------+------------------------------------------------+ ========================= Derby Configuration @@ -148,21 +148,19 @@ Finally, you must add the following configurations to `conf/catalog-site.xml` : </property> ================================== -HCatalogStore Configuration +HiveCatalogStore Configuration ================================== -Tajo support HCatalogStore to integrate with hive. If you want to use HCatalogStore, you just do as follows. +Tajo support HiveCatalogStore to integrate with hive. If you want to use HiveCatalogStore, you just do as follows. First, you must compile source code and get a binary archive as follows: .. code-block:: sh $ git clone https://git-wip-us.apache.org/repos/asf/tajo.git tajo - $ mvn clean install -DskipTests -Pdist -Dtar -Phcatalog-0.1x.0 + $ mvn clean install -DskipTests -Pdist -Dtar $ ls tajo-dist/target/tajo-x.y.z-SNAPSHOT.tar.gz -Currently Tajo supports hive 0.12.0, hive 0.13.0, hive 0.13.1. If you enables HCatalogStore, you set the maven profile as ``-Phcatalog-0.12.0``. - Second, you must set your hive home directory to HIVE_HOME variable in ``conf/tajo-env.sh`` with it as follows: .. code-block:: sh @@ -182,5 +180,5 @@ Lastly, you should add the following config to ``conf/catalog-site.xml`` : <property> <name>tajo.catalog.store.class</name> - <value>org.apache.tajo.catalog.store.HCatalogStore</value> + <value>org.apache.tajo.catalog.store.HiveCatalogStore</value> </property>
http://git-wip-us.apache.org/repos/asf/tajo/blob/955a7bf8/tajo-docs/src/main/sphinx/hcatalog_integration.rst ---------------------------------------------------------------------- diff --git a/tajo-docs/src/main/sphinx/hcatalog_integration.rst b/tajo-docs/src/main/sphinx/hcatalog_integration.rst deleted file mode 100644 index d81975d..0000000 --- a/tajo-docs/src/main/sphinx/hcatalog_integration.rst +++ /dev/null @@ -1,52 +0,0 @@ -************************************* -HCatalog Integration -************************************* - -Apache Tajo⢠catalog supports HCatalogStore driver to integrate with Apache Hiveâ¢. -This integration allows Tajo to access all tables used in Apache Hive. -Depending on your purpose, you can execute either SQL queries or HiveQL queries on the -same tables managed in Apache Hive. - -In order to use this feature, you need to build Tajo with a specified maven profile -and then add some configs into ``conf/tajo-env.sh`` and ``conf/catalog-site.xml``. -This section describes how to setup HCatalog integration. -This instruction would take no more than ten minutes. - -First, you need to compile the source code with hcatalog profile. -Currently, Tajo supports hcatalog-0.11.0 and hcatalog-0.12.0 profile. -So, if you want to use Hive 0.11.0, you need to set ``-Phcatalog-0.11.0`` as the maven profile :: - - $ mvn clean package -DskipTests -Pdist -Dtar -Phcatalog-0.11.0 - -Or, if you want to use Hive 0.12.0, you need to set ``-Phcatalog-0.12.0`` as the maven profile :: - - $ mvn clean package -DskipTests -Pdist -Dtar -Phcatalog-0.12.0 - -Then, you need to set your Hive home directory to the environment variable ``HIVE_HOME`` in conf/tajo-env.sh as follows: :: - - export HIVE_HOME=/path/to/your/hive/directory - -If you need to use jdbc to connect HiveMetaStore, you have to prepare MySQL jdbc driver. -Next, you should set the path of MySQL JDBC driver jar file to the environment variable HIVE_JDBC_DRIVER_DIR in conf/tajo-env.sh as follows: :: - - export HIVE_JDBC_DRIVER_DIR==/path/to/your/mysql_jdbc_driver/mysql-connector-java-x.x.x-bin.jar - -Finally, you should specify HCatalogStore as Tajo catalog driver class in ``conf/catalog-site.xml`` as follows: :: - - <property> - <name>tajo.catalog.store.class</name> - <value>org.apache.tajo.catalog.store.HCatalogStore</value> - </property> - -.. note:: - - Hive stores a list of partitions for each table in its metastore. If new partitions are - directly added to HDFS, HiveMetastore will not able aware of these partitions unless the user - ``ALTER TABLE table_name ADD PARTITION`` commands on each of the newly added partitions or - ``MSCK REPAIR TABLE table_name`` command. - - But current tajo doesn't provide ``ADD PARTITION`` command and hive doesn't provide an api for - responding to ``MSK REPAIR TABLE`` command. Thus, if you insert data to hive partitioned - table and you want to scan the updated partitions through Tajo, you must run following command on hive :: - - $ MSCK REPAIR TABLE [table_name]; http://git-wip-us.apache.org/repos/asf/tajo/blob/955a7bf8/tajo-docs/src/main/sphinx/hive_integration.rst ---------------------------------------------------------------------- diff --git a/tajo-docs/src/main/sphinx/hive_integration.rst b/tajo-docs/src/main/sphinx/hive_integration.rst new file mode 100644 index 0000000..4c1d8d4 --- /dev/null +++ b/tajo-docs/src/main/sphinx/hive_integration.rst @@ -0,0 +1,42 @@ +************************************* +Hive Integration +************************************* + +Apache Tajo⢠catalog supports HiveCatalogStore to integrate with Apache Hiveâ¢. +This integration allows Tajo to access all tables used in Apache Hive. +Depending on your purpose, you can execute either SQL queries or HiveQL queries on the +same tables managed in Apache Hive. + +In order to use this feature, you need to build Tajo with a specified maven profile +and then add some configs into ``conf/tajo-env.sh`` and ``conf/catalog-site.xml``. +This section describes how to setup HiveMetaStore integration. +This instruction would take no more than five minutes. + +You need to set your Hive home directory to the environment variable ``HIVE_HOME`` in conf/tajo-env.sh as follows: :: + + export HIVE_HOME=/path/to/your/hive/directory + +If you need to use jdbc to connect HiveMetaStore, you have to prepare MySQL jdbc driver. +Next, you should set the path of MySQL JDBC driver jar file to the environment variable HIVE_JDBC_DRIVER_DIR in conf/tajo-env.sh as follows: :: + + export HIVE_JDBC_DRIVER_DIR==/path/to/your/mysql_jdbc_driver/mysql-connector-java-x.x.x-bin.jar + +Finally, you should specify HiveCatalogStore as Tajo catalog driver class in ``conf/catalog-site.xml`` as follows: :: + + <property> + <name>tajo.catalog.store.class</name> + <value>org.apache.tajo.catalog.store.HiveCatalogStore</value> + </property> + +.. note:: + + Hive stores a list of partitions for each table in its metastore. If new partitions are + directly added to HDFS, HiveMetastore will not able aware of these partitions unless the user + ``ALTER TABLE table_name ADD PARTITION`` commands on each of the newly added partitions or + ``MSCK REPAIR TABLE table_name`` command. + + But current tajo doesn't provide ``ADD PARTITION`` command and hive doesn't provide an api for + responding to ``MSK REPAIR TABLE`` command. Thus, if you insert data to hive partitioned + table and you want to scan the updated partitions through Tajo, you must run following command on hive :: + + $ MSCK REPAIR TABLE [table_name]; http://git-wip-us.apache.org/repos/asf/tajo/blob/955a7bf8/tajo-docs/src/main/sphinx/index.rst ---------------------------------------------------------------------- diff --git a/tajo-docs/src/main/sphinx/index.rst b/tajo-docs/src/main/sphinx/index.rst index 0ab50b6..730bed4 100644 --- a/tajo-docs/src/main/sphinx/index.rst +++ b/tajo-docs/src/main/sphinx/index.rst @@ -39,7 +39,7 @@ Table of Contents: table_partitioning index_overview backup_and_restore - hcatalog_integration + hive_integration hbase_integration swift_integration jdbc_driver http://git-wip-us.apache.org/repos/asf/tajo/blob/955a7bf8/tajo-plan/src/main/java/org/apache/tajo/plan/rewrite/rules/FilterPushDownRule.java ---------------------------------------------------------------------- diff --git a/tajo-plan/src/main/java/org/apache/tajo/plan/rewrite/rules/FilterPushDownRule.java b/tajo-plan/src/main/java/org/apache/tajo/plan/rewrite/rules/FilterPushDownRule.java index a6a7c78..dc6b8ef 100644 --- a/tajo-plan/src/main/java/org/apache/tajo/plan/rewrite/rules/FilterPushDownRule.java +++ b/tajo-plan/src/main/java/org/apache/tajo/plan/rewrite/rules/FilterPushDownRule.java @@ -856,7 +856,7 @@ public class FilterPushDownRule extends BasicLogicalPlanVisitor<FilterPushDownCo } Column column = columns.iterator().next(); - // If catalog runs with HCatalog, partition column is a qualified name + // If catalog runs with HiveCatalogStore, partition column is a qualified name // Else partition column is a simple name boolean isPartitionColumn = false; if (hasQualifiedName) { http://git-wip-us.apache.org/repos/asf/tajo/blob/955a7bf8/tajo-project/pom.xml ---------------------------------------------------------------------- diff --git a/tajo-project/pom.xml b/tajo-project/pom.xml index 65fbaa3..6fa995d 100644 --- a/tajo-project/pom.xml +++ b/tajo-project/pom.xml @@ -37,6 +37,7 @@ <protobuf.version>2.5.0</protobuf.version> <tajo.version>0.11.0-SNAPSHOT</tajo.version> <hbase.version>0.98.7-hadoop2</hbase.version> + <hive.version>1.1.0</hive.version> <netty.version>4.0.25.Final</netty.version> <jersey.version>2.6</jersey.version> <tajo.root>${project.parent.relativePath}/..</tajo.root>
