[
https://issues.apache.org/jira/browse/TAJO-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14729441#comment-14729441
]
ASF GitHub Bot commented on TAJO-1493:
--------------------------------------
Github user blrunner commented on the pull request:
https://github.com/apache/tajo/pull/624#issuecomment-137522218
I've finished tests successfully for this PR as following:
* Data: TPC-H benchmark set (scale factor = 1)
* CatalogStore: MySQLStore, HiveCatalogStore(with apache hive 1.1.0)
* CTAS for creating partition tables (except partsupp)
```
CREATE TABLE LINEITEM_P (
L_ORDERKEY bigint, L_PARTKEY bigint, L_SUPPKEY bigint, L_LINENUMBER bigint,
L_QUANTITY double, L_EXTENDEDPRICE double, L_DISCOUNT double, L_TAX double,
L_RETURNFLAG text, L_LINESTATUS text,L_COMMITDATE date, L_RECEIPTDATE date,
L_SHIPINSTRUCT text, L_SHIPMODE text, L_COMMENT text)
USING TEXT WITH ('text.delimiter'='|')
PARTITION BY COLUMN (L_SHIPDATE date) AS
SELECT L_ORDERKEY, L_PARTKEY, L_SUPPKEY, L_LINENUMBER, L_QUANTITY,
L_EXTENDEDPRICE, L_DISCOUNT, L_TAX, L_RETURNFLAG, L_LINESTATUS, L_COMMITDATE,
L_RECEIPTDATE, L_SHIPINSTRUCT, L_SHIPMODE, L_COMMENT, L_SHIPDATE FROM LINEITEM;
create table customer_p (C_CUSTKEY bigint, C_NAME text, C_ADDRESS text,
C_PHONE text, C_ACCTBAL double, C_MKTSEGMENT text, C_COMMENT text)
USING TEXT WITH ('text.delimiter'='|')
PARTITION BY COLUMN (C_NATIONKEY bigint) AS
SELECT C_CUSTKEY, C_NAME, C_ADDRESS, C_PHONE, C_ACCTBAL, C_MKTSEGMENT,
C_COMMENT, C_NATIONKEY FROM customer;
create table supplier_p (S_SUPPKEY bigint, S_NAME text, S_ADDRESS text,
S_PHONE text, S_ACCTBAL double, S_COMMENT text)
USING TEXT WITH ('text.delimiter'='|')
PARTITION BY COLUMN ( S_NATIONKEY bigint) AS
SELECT S_SUPPKEY, S_NAME, S_ADDRESS, S_PHONE, S_ACCTBAL, S_COMMENT,
S_NATIONKEY FROM supplier;
create table part_p (P_PARTKEY bigint, P_NAME text, P_MFGR text, P_BRAND
text, P_TYPE text, P_CONTAINER text, P_RETAILPRICE double, P_COMMENT text)
USING TEXT WITH ('text.delimiter'='|')
PARTITION BY COLUMN (P_SIZE integer) AS
SELECT P_PARTKEY, P_NAME, P_MFGR, P_BRAND, P_TYPE, P_CONTAINER,
P_RETAILPRICE, P_COMMENT, P_SIZE FROM part;
create table orders_p (O_ORDERKEY bigint, O_CUSTKEY bigint, O_ORDERSTATUS
text, O_TOTALPRICE double, O_ORDERPRIORITY text, O_CLERK text, O_SHIPPRIORITY
int, O_COMMENT text)
USING TEXT WITH ('text.delimiter'='|')
PARTITION BY COLUMN (O_ORDERDATE date) AS
SELECT O_ORDERKEY, O_CUSTKEY, O_ORDERSTATUS, O_TOTALPRICE, O_ORDERPRIORITY,
O_CLERK, O_SHIPPRIORITY, O_COMMENT, O_ORDERDATE FROM orders;
create table nation_p (N_NATIONKEY bigint, N_NAME text, N_COMMENT text)
USING TEXT WITH ('text.delimiter'='|')
PARTITION BY COLUMN (N_REGIONKEY bigint) AS
SELECT N_NATIONKEY, N_NAME, N_COMMENT, N_REGIONKEY FROM nation;
create table region_p ( R_NAME text, R_COMMENT text)
USING TEXT WITH ('text.delimiter'='|')
PARTITION BY COLUMN (R_REGIONKEY bigint) AS
SELECT R_NAME, R_COMMENT, R_REGIONKEY FROM region;
```
* Test Queries : TPC-H Q1 ~ Q22
After executing CTAS statements, I fount that list of table partitions had
been printed on hive shell. All test results mostly accorded with results of
queries for non-partitoned tables in MySQLStore and HiveCatalogStore. But some
records for double type record aggregation were slightly different in the
nearest whole number, To resolve above problem, we need to provide decimal
column type.
> Add a method to get partition directories with filter conditions.
> -----------------------------------------------------------------
>
> Key: TAJO-1493
> URL: https://issues.apache.org/jira/browse/TAJO-1493
> Project: Tajo
> Issue Type: Sub-task
> Components: Catalog
> Reporter: Jaehwa Jung
> Assignee: Jaehwa Jung
> Fix For: 0.11.0, 0.12.0
>
> Attachments: TAJO-1493.patch, TAJO-1493_2.patch
>
>
> Currently, PartitionedTableRewriter take a look into partition directories
> for rewriting filter conditions. It get all sub directories of table path
> because catalog doesn’t provide partition directories. But if there are lots
> of sub directories on HDFS, such as, more than 10,000 directories, it might
> be cause overload to NameNode. Thus, CatalogStore need to provide partition
> directories for specified filter conditions. I designed new method to
> CatalogStore as follows:
> * method name: getPartitionsWithConditionFilters
> * first parameter: database name
> * second parameter: table name
> * third parameter: where clause (included target column name and partition
> value)
> * return values:
> List<org.apache.tajo.catalog.proto.CatalogProtos.TablePartitionProto>
> * description: It scan right partition directories on CatalogStore with where
> caluse.
> For examples, users set parameters as following:
> ** first parameter: default
> ** second parameter: table1
> ** third parameter: COLUMN_NAME = 'col1' AND PARTITION_VALUE = '3
> In the previous cases, this method will create select clause as follows.
> {code:xml}
> SELECT DISTINCT A.PATH
> FROM PARTITIONS A, (
> SELECT B.PARTITION_ID
> FROM PARTITION_KEYS B
> WHERE B.PARTITION_ID > 0
> AND (
> COLUMN_NAME = 'col1' AND PARTITION_VALUE = '3'
> )
> ) B
> WHERE A.PARTITION_ID > 0
> AND A.TID = ${table_id}
> AND A.PARTITION_ID = B.PARTITION_ID
> {code}
> At the first time, I considered to use EvalNode instead of where clause. But
> I can’t use it because of recursive related problems between tajo-catalog
> module and tajo-plan module. So, I’ll implement utility class to convert
> EvalNode to SQL.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)