[
https://issues.apache.org/jira/browse/TAJO-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jaehwa Jung updated TAJO-1493:
------------------------------
Summary: Add a method to get partition directories with filter conditions.
(was: Added a method to get partition directories with filter conditions.)
> Add a method to get partition directories with filter conditions.
> -----------------------------------------------------------------
>
> Key: TAJO-1493
> URL: https://issues.apache.org/jira/browse/TAJO-1493
> Project: Tajo
> Issue Type: Sub-task
> Components: catalog
> Reporter: Jaehwa Jung
> Assignee: Jaehwa Jung
>
> Currently, PartitionedTableRewriter take a look into partition directories
> for rewriting filter conditions. It get all sub directories of table path
> because catalog doesn’t provide partition directories. But if there are lots
> of sub directories on HDFS, such as, more than 10,000 directories, it might
> be cause overload to NameNode. Thus, CatalogStore need to provide partition
> directories for specified filter conditions. I designed new method to
> CatalogStore as follows:
> * method name: getPartitionsWithConditionFilters
> * first parameter: database name
> * second parameter: table name
> * third parameter: where clause (included target column name and partition
> value)
> * return values:
> List<org.apache.tajo.catalog.proto.CatalogProtos.TablePartitionProto>
> * description: It scan right partition directories on CatalogStore with where
> caluse. For examples, users set parameters as following:
> ** first parameter: default
> ** second parameter: table1
> ** third parameter: COLUMN_NAME = 'col1' AND PARTITION_VALUE = '3
> In the previous cases, this method will create select clause as follows.
> {code:xml}
> SELECT DISTINCT A.PATH
> FROM PARTITIONS A, (
> SELECT B.PID
> FROM PARTITION_KEYS B
> WHERE B.PID > 0
> AND (
> COLUMN_NAME = 'col1' AND PARTITION_VALUE = '3'
> )
> ) B
> WHERE A.PID > 0
> AND A.TID = ${table_id}
> AND A.PID = B.PID
> {code}
> At the first time, I considered to use EvalNode instead of where clause. But
> I can’t use it because of recursive related problems between tajo-catalog
> module and tajo-plan module. So, I’ll implement utility class to convert
> EvalNode to SQL.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)