[ 
https://issues.apache.org/jira/browse/TAJO-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14900136#comment-14900136
 ] 

ASF GitHub Bot commented on TAJO-1493:
--------------------------------------

GitHub user blrunner opened a pull request:

    https://github.com/apache/tajo/pull/772

    TAJO-1493: Make partition pruning based on catalog informations.

    Reopen PR which contains following features.
    * Add columns for partition volume and file count to Catalog
    * Allow removed partition directories in partitioned table.
    * Implement PartitionNotFoundException handling
    * Implement UnsupportedException handling
    * Add unit test cases for abnormal partition directories
    
    *NOTE: old PR is https://github.com/apache/tajo/pull/624.*

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/blrunner/tajo TAJO-1493

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/tajo/pull/772.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #772
    
----
commit 743122bade809e5a6fd6ccbb5025a87233e0ac40
Author: JaeHwa Jung <[email protected]>
Date:   2015-07-06T01:49:20Z

    TAJO-1493: Add a method to get partition directories with filter conditions.

commit 422dbb10b53d3f97f97d9723248e905e3cc76e78
Author: JaeHwa Jung <[email protected]>
Date:   2015-07-06T06:28:26Z

    Fix updateTableStats error

commit 515f0e364c31150e518d0a139e3bb4728fbf36ca
Author: JaeHwa Jung <[email protected]>
Date:   2015-07-10T02:49:27Z

    Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo into 
TAJO-1493

commit 23297030c6d4d033f57cac39d2b710de590306bd
Author: JaeHwa Jung <[email protected]>
Date:   2015-07-14T07:18:18Z

    Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo into 
TAJO-1493

commit 7cb8fa81302d070e3d1c5c0e87f3d39ba574c5ce
Author: JaeHwa Jung <[email protected]>
Date:   2015-07-17T10:28:56Z

    Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo into 
TAJO-1493
    
    Conflicts:
        
tajo-catalog/tajo-catalog-server/src/test/java/org/apache/tajo/catalog/TestCatalog.java

commit 2073d04dbab81449e0ee27222c9897d9a07ada7c
Author: JaeHwa Jung <[email protected]>
Date:   2015-07-24T02:54:47Z

    Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo into 
TAJO-1493
    
    Conflicts:
        
tajo-catalog/tajo-catalog-client/src/main/java/org/apache/tajo/catalog/AbstractCatalogClient.java
        tajo-catalog/tajo-catalog-client/src/main/proto/CatalogProtocol.proto
        
tajo-catalog/tajo-catalog-common/src/main/java/org/apache/tajo/catalog/CatalogService.java
        
tajo-catalog/tajo-catalog-server/src/main/java/org/apache/tajo/catalog/CatalogServer.java
        
tajo-plan/src/main/java/org/apache/tajo/plan/rewrite/rules/PartitionedTableRewriter.java

commit 1c615b598d1acb9e186170b8eca5f9ea8c657321
Author: JaeHwa Jung <[email protected]>
Date:   2015-07-31T03:14:40Z

    Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo into 
TAJO-1493
    
    Conflicts:
        tajo-catalog/tajo-catalog-client/src/main/proto/CatalogProtocol.proto
        
tajo-catalog/tajo-catalog-server/src/main/java/org/apache/tajo/catalog/CatalogServer.java

commit a7f7035583bd16454d6596af03adce390f5d9ec7
Author: JaeHwa Jung <[email protected]>
Date:   2015-08-02T16:19:35Z

    Implement getPartitionsByDirectSql and fix some bugs

commit ff19c4934b8032158a7a8d02756f2688c61b8043
Author: JaeHwa Jung <[email protected]>
Date:   2015-08-02T16:33:09Z

    Remove unnecessary updates

commit d30a6eb66a964aaab0318abe0b342a80737a7d36
Author: JaeHwa Jung <[email protected]>
Date:   2015-08-02T16:34:25Z

    Remove unnecessary codes

commit e492d68225e21023811aa308f2ac56a81218ff50
Author: JaeHwa Jung <[email protected]>
Date:   2015-08-03T07:13:55Z

    Optimize direct sql and add more description.

commit f40e36d4aaa801aa0b660f30ba25ead92bb9a2d2
Author: JaeHwa Jung <[email protected]>
Date:   2015-08-03T07:32:27Z

    Rename SQLFinderWithPartitionFilter to PartitionDirectSQLBuilder

commit 387ba94f5e5eea179bf4355413bee0302fd21706
Author: JaeHwa Jung <[email protected]>
Date:   2015-08-03T09:35:04Z

    Implement HiveCatalogStore::getPartitionsByDirectSql

commit c470489646e288d1e7a94e17cc066a559bcc1e29
Author: JaeHwa Jung <[email protected]>
Date:   2015-08-03T13:32:20Z

    Fix errors for addPartitions and getPartitionsByDirectSql

commit 560c202495651148076e4f44541de321a21fb021
Author: JaeHwa Jung <[email protected]>
Date:   2015-08-03T15:08:29Z

    Add unit test cases for PartitionedTableRewriter

commit 15c5082ba2433ea487dae11ead9f2f94fb4aeafb
Author: JaeHwa Jung <[email protected]>
Date:   2015-08-03T15:22:50Z

    Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo into 
TAJO-1493

commit 28cbde04bb3a5144038430f333745cc32800c0f4
Author: JaeHwa Jung <[email protected]>
Date:   2015-08-04T05:12:14Z

    Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo into 
TAJO-1493
    
    Conflicts:
        
tajo-catalog/tajo-catalog-common/src/main/java/org/apache/tajo/catalog/CatalogService.java
        
tajo-catalog/tajo-catalog-server/src/main/java/org/apache/tajo/catalog/store/MemStore.java

commit b9eadd2f627dc55595192f218daba0b58db59f80
Author: JaeHwa Jung <[email protected]>
Date:   2015-08-06T17:35:18Z

    Remove unnecessary codes

commit f1a0fab0cbd01090e6d26b4167ecba0a750d9303
Author: JaeHwa Jung <[email protected]>
Date:   2015-08-06T18:01:46Z

    Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo into 
TAJO-1493
    
    Conflicts:
        
tajo-catalog/tajo-catalog-server/src/main/java/org/apache/tajo/catalog/CatalogServer.java

commit 7f53fc5de593beee8dbc5f80f5fcb20ce1dca00b
Author: JaeHwa Jung <[email protected]>
Date:   2015-08-06T18:08:53Z

    Remove unused packages.

commit 4aa8ba14ecd1fc4eebec3b6158b2857d1e83d23f
Author: JaeHwa Jung <[email protected]>
Date:   2015-08-20T08:38:47Z

    Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo into 
TAJO-1493
    
    Conflicts:
        
tajo-catalog/tajo-catalog-client/src/main/java/org/apache/tajo/catalog/AbstractCatalogClient.java
        
tajo-catalog/tajo-catalog-common/src/main/java/org/apache/tajo/catalog/CatalogService.java
        
tajo-catalog/tajo-catalog-drivers/tajo-hive/src/main/java/org/apache/tajo/catalog/store/HiveCatalogStore.java
        
tajo-catalog/tajo-catalog-server/src/main/java/org/apache/tajo/catalog/store/AbstractDBStore.java
        
tajo-catalog/tajo-catalog-server/src/main/java/org/apache/tajo/catalog/store/CatalogStore.java
        
tajo-catalog/tajo-catalog-server/src/main/java/org/apache/tajo/catalog/store/MemStore.java
        tajo-common/src/main/java/org/apache/tajo/exception/ErrorMessages.java

commit 03ea651b45f8ae025243b20f6df86393db49cc15
Author: JaeHwa Jung <[email protected]>
Date:   2015-08-24T03:11:35Z

    Add ScanQualConverter

commit 5a405373917e337a983bf6c4fbed9340840197d9
Author: JaeHwa Jung <[email protected]>
Date:   2015-08-24T03:12:03Z

    Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo into 
TAJO-1493

commit 43c8be02bbdb5c52e3cb292974bb15eab09be42e
Author: JaeHwa Jung <[email protected]>
Date:   2015-08-24T09:12:45Z

    Implement CatalogStore::getPartitionsByAlgebra

commit 63a02091ce8506e24d269d9869e5f253efe7f409
Author: JaeHwa Jung <[email protected]>
Date:   2015-08-24T09:19:16Z

    Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo into 
TAJO-1493

commit 0db124cdb28faa57cab005a5c9d481202b9c1a8c
Author: JaeHwa Jung <[email protected]>
Date:   2015-08-24T17:07:44Z

    Fix unit test errors and remove unnecessary codes

commit 7e04346dfaacd4dedfe84716baeb13e6e4e4cf5e
Author: JaeHwa Jung <[email protected]>
Date:   2015-08-24T23:56:34Z

    Rename method name and unused codes

commit 9aeb2f792f5cd7c58202e475a8728cf30a6f6c9f
Author: JaeHwa Jung <[email protected]>
Date:   2015-08-25T01:14:30Z

    Add UNDEFINED_PARTITIONS

commit 5000c3bc6c8aa29b526d25ce0423f79823515874
Author: JaeHwa Jung <[email protected]>
Date:   2015-08-25T01:59:57Z

    Fix mismatched argument bug

commit 8982438882c3506a2c307bf650c8ed436346c33e
Author: JaeHwa Jung <[email protected]>
Date:   2015-08-31T01:50:24Z

    Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo into 
TAJO-1493

----


> Make partition pruning based on catalog informations
> ----------------------------------------------------
>
>                 Key: TAJO-1493
>                 URL: https://issues.apache.org/jira/browse/TAJO-1493
>             Project: Tajo
>          Issue Type: Sub-task
>          Components: Catalog, Planner/Optimizer
>            Reporter: Jaehwa Jung
>            Assignee: Jaehwa Jung
>             Fix For: 0.11.0, 0.12.0
>
>         Attachments: TAJO-1493.patch, TAJO-1493_2.patch, TAJO-1493_3.patch, 
> TAJO-1493_4.patch
>
>
> Currently, PartitionedTableRewriter take a look into partition directories 
> for rewriting filter conditions. It get all sub directories of table path 
> because catalog doesn’t provide partition directories. But if there are lots 
> of sub directories on HDFS, such as, more than 10,000 directories, it might 
> be cause overload to NameNode. Thus, CatalogStore need to provide partition 
> directories for specified filter conditions. I designed new method to 
> CatalogStore as follows:
> * method name: getPartitionsWithConditionFilters
> * first parameter: database name
> * second parameter: table name
> * third parameter: where clause (included target column name and partition 
> value)
> * return values: 
> List<org.apache.tajo.catalog.proto.CatalogProtos.TablePartitionProto>
> * description: It scan right partition directories on CatalogStore with where 
> caluse. 
>   For examples, users set parameters as following:
> ** first parameter: default
> ** second parameter: table1
> ** third parameter: COLUMN_NAME = 'col1' AND PARTITION_VALUE = '3
> In the previous cases, this method will create select clause as follows.
> {code:xml}
> SELECT DISTINCT A.PATH
> FROM PARTITIONS A, (
>   SELECT B.PARTITION_ID
>   FROM PARTITION_KEYS B
>   WHERE B.PARTITION_ID > 0 
>   AND (
>     COLUMN_NAME = 'col1' AND PARTITION_VALUE = '3'
>   )
> ) B
> WHERE A.PARTITION_ID > 0
> AND A.TID = ${table_id}
> AND A.PARTITION_ID = B.PARTITION_ID
> {code}
> At the first time, I considered to use EvalNode instead of where clause. But 
> I can’t use it because of recursive related problems between tajo-catalog 
> module and tajo-plan module. So, I’ll implement utility class to convert 
> EvalNode to SQL.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to