[jira] [Commented] (TAJO-1493) Make partition pruning based on catalog informations

ASF GitHub Bot (JIRA) Thu, 24 Sep 2015 00:38:37 -0700

    [ 
https://issues.apache.org/jira/browse/TAJO-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14905952#comment-14905952
 ]


ASF GitHub Bot commented on TAJO-1493:
--------------------------------------

Github user blrunner commented on the pull request:

    https://github.com/apache/tajo/pull/772#issuecomment-142839704
  
    All unit test finished successfully on my laptop. And I tested this patch 
successfully as following:
    
    * Data: TPC-H 1G
    * Query: Q1, Q3, Q4, Q5, Q8
    * Catalog : MySQLStore, HiveCatalogStore
    * Table schema:
    
    ```
    CREATE TABLE customer (c_custkey INT8, c_name TEXT, c_address TEXT, c_phone 
TEXT, c_acctbal FLOAT8, c_mktsegment TEXT, c_comment TEXT) USING TEXT  
PARTITION BY COLUMN(c_nationkey INT8) ;
    
    CREATE TABLE lineitem (l_orderkey INT8, l_partkey INT8, l_suppkey INT8, 
l_linenumber INT8, l_quantity FLOAT8, l_extendedprice FLOAT8, l_discount 
FLOAT8, l_tax FLOAT8, l_commitdate DATE, l_receiptdate DATE, l_shipinstruct 
TEXT, l_shipmode TEXT, l_comment TEXT) USING TEXT  PARTITION BY 
COLUMN(l_shipdate DATE, l_returnflag TEXT, l_linestatus TEXT) ;
    
    CREATE TABLE nation (n_nationkey INT8, n_name TEXT, n_comment TEXT) USING 
TEXT  PARTITION BY COLUMN(n_regionkey INT8) ;
    
    CREATE TABLE orders (o_orderkey INT8, o_custkey INT8, o_totalprice FLOAT8, 
o_clerk TEXT, o_shippriority INT4, o_comment TEXT) USING TEXT  PARTITION BY 
COLUMN(o_orderdate DATE, o_orderstatus TEXT, o_orderpriority TEXT) ;
    
    CREATE TABLE part (p_partkey INT8, p_name TEXT, p_mfgr TEXT, p_brand TEXT, 
p_type TEXT, p_container TEXT, p_retailprice FLOAT8, p_comment TEXT) USING TEXT 
PARTITION BY COLUMN(p_size INT4) ;
    
    CREATE TABLE partsupp (ps_partkey INT8, ps_suppkey INT8, ps_availqty INT4, 
ps_supplycost FLOAT8, ps_comment TEXT) USING TEXT ;
    
    CREATE TABLE region (r_name TEXT, r_comment TEXT) USING TEXT  PARTITION BY 
COLUMN(r_regionkey INT8);
    
    CREATE TABLE supplier (s_suppkey INT8, s_name TEXT, s_address TEXT, s_phone 
TEXT, s_acctbal FLOAT8, s_comment TEXT) USING TEXT PARTITION BY 
COLUMN(s_nationkey INT8);
    ```



> Make partition pruning based on catalog informations
> ----------------------------------------------------
>
>                 Key: TAJO-1493
>                 URL: https://issues.apache.org/jira/browse/TAJO-1493
>             Project: Tajo
>          Issue Type: Sub-task
>          Components: Catalog, Planner/Optimizer
>            Reporter: Jaehwa Jung
>            Assignee: Jaehwa Jung
>             Fix For: 0.11.0, 0.12.0
>
>         Attachments: TAJO-1493.patch, TAJO-1493_2.patch, TAJO-1493_3.patch, 
> TAJO-1493_4.patch, TAJO-1493_5.patch, TAJO-1493_6.patch
>
>
> Currently, PartitionedTableRewriter take a look into partition directories 
> for rewriting filter conditions. It get all sub directories of table path 
> because catalog doesn’t provide partition directories. But if there are lots 
> of sub directories on HDFS, such as, more than 10,000 directories, it might 
> be cause overload to NameNode. Thus, CatalogStore need to provide partition 
> directories for specified filter conditions. I designed new method to 
> CatalogStore as follows:
> * method name: getPartitionsWithConditionFilters
> * first parameter: database name
> * second parameter: table name
> * third parameter: where clause (included target column name and partition 
> value)
> * return values: 
> List<org.apache.tajo.catalog.proto.CatalogProtos.TablePartitionProto>
> * description: It scan right partition directories on CatalogStore with where 
> caluse. 
>   For examples, users set parameters as following:
> ** first parameter: default
> ** second parameter: table1
> ** third parameter: COLUMN_NAME = 'col1' AND PARTITION_VALUE = '3
> In the previous cases, this method will create select clause as follows.
> {code:xml}
> SELECT DISTINCT A.PATH
> FROM PARTITIONS A, (
>   SELECT B.PARTITION_ID
>   FROM PARTITION_KEYS B
>   WHERE B.PARTITION_ID > 0 
>   AND (
>     COLUMN_NAME = 'col1' AND PARTITION_VALUE = '3'
>   )
> ) B
> WHERE A.PARTITION_ID > 0
> AND A.TID = ${table_id}
> AND A.PARTITION_ID = B.PARTITION_ID
> {code}
> At the first time, I considered to use EvalNode instead of where clause. But 
> I can’t use it because of recursive related problems between tajo-catalog 
> module and tajo-plan module. So, I’ll implement utility class to convert 
> EvalNode to SQL.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TAJO-1493) Make partition pruning based on catalog informations

Reply via email to