[ 
https://issues.apache.org/jira/browse/TAJO-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14651977#comment-14651977
 ] 

ASF GitHub Bot commented on TAJO-1493:
--------------------------------------

Github user blrunner commented on the pull request:

    https://github.com/apache/tajo/pull/624#issuecomment-127275276
  
    I found that this patch run as expected with HiveCatalogStore and 
MySQLStore on my testing cluster. And simple query response had been reported 
as following:
    
    * Table schema:
    ```
    create table partitioned_lineitem (L_SUPPKEY bigint, L_LINENUMBER bigint,
    L_QUANTITY double, L_EXTENDEDPRICE double, L_DISCOUNT double, L_TAX double, 
L_RETURNFLAG text, L_LINESTATUS text,
    L_SHIPDATE text, L_COMMITDATE text, L_RECEIPTDATE text, L_SHIPINSTRUCT 
text, L_SHIPMODE text, L_COMMENT text)
    partition by column (L_ORDERKEY bigint, L_PARTKEY bigint)
    ```
    * Partition numbers: 100,000
    * Select statement: select * from partitioned_lineitem limit 10;
    * Response time:
    - previous rewriter: 15 ~ 16 sec
    - improved rewriter: 12 ~ 13 sec
    
    Honestly, I didn't implement unit test cases for executing queries because 
current almost tajo unit cases operate on MemStore. If we apply DerbyStore to 
some unit test cases for physical operator, we would make a lot of effort. It 
seems not to be the scope of this patch. So, I just added unit test cases for 
verifying direct sql. But if you want to test this patch with build commands, 
you can test with `-Dtajo.catalog.store.class` parameter as following:
    ```
    mvn clean install -Pparallel-test -DLOG_LEVEL=WARN -Dmaven.fork.count=2 
-Dtajo.catalog.store.class=org.apache.tajo.catalog.store.DerbyStore
    ```


> Add a method to get partition directories with filter conditions.
> -----------------------------------------------------------------
>
>                 Key: TAJO-1493
>                 URL: https://issues.apache.org/jira/browse/TAJO-1493
>             Project: Tajo
>          Issue Type: Sub-task
>          Components: Catalog
>            Reporter: Jaehwa Jung
>            Assignee: Jaehwa Jung
>
> Currently, PartitionedTableRewriter take a look into partition directories 
> for rewriting filter conditions. It get all sub directories of table path 
> because catalog doesn’t provide partition directories. But if there are lots 
> of sub directories on HDFS, such as, more than 10,000 directories, it might 
> be cause overload to NameNode. Thus, CatalogStore need to provide partition 
> directories for specified filter conditions. I designed new method to 
> CatalogStore as follows:
> * method name: getPartitionsWithConditionFilters
> * first parameter: database name
> * second parameter: table name
> * third parameter: where clause (included target column name and partition 
> value)
> * return values: 
> List<org.apache.tajo.catalog.proto.CatalogProtos.TablePartitionProto>
> * description: It scan right partition directories on CatalogStore with where 
> caluse. 
>   For examples, users set parameters as following:
> ** first parameter: default
> ** second parameter: table1
> ** third parameter: COLUMN_NAME = 'col1' AND PARTITION_VALUE = '3
> In the previous cases, this method will create select clause as follows.
> {code:xml}
> SELECT DISTINCT A.PATH
> FROM PARTITIONS A, (
>   SELECT B.PARTITION_ID
>   FROM PARTITION_KEYS B
>   WHERE B.PARTITION_ID > 0 
>   AND (
>     COLUMN_NAME = 'col1' AND PARTITION_VALUE = '3'
>   )
> ) B
> WHERE A.PARTITION_ID > 0
> AND A.TID = ${table_id}
> AND A.PARTITION_ID = B.PARTITION_ID
> {code}
> At the first time, I considered to use EvalNode instead of where clause. But 
> I can’t use it because of recursive related problems between tajo-catalog 
> module and tajo-plan module. So, I’ll implement utility class to convert 
> EvalNode to SQL.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to