[jira] [Updated] (TAJO-1493) Add a method to get partition directories with filter conditions.

Jaehwa Jung (JIRA) Tue, 31 Mar 2015 20:08:26 -0700

     [ 
https://issues.apache.org/jira/browse/TAJO-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jaehwa Jung updated TAJO-1493:
------------------------------
    Description: 
Currently, PartitionedTableRewriter take a look into partition directories for 
rewriting filter conditions. It get all sub directories of table path because 
catalog doesn’t provide partition directories. But if there are lots of sub 
directories on HDFS, such as, more than 10,000 directories, it might be cause 
overload to NameNode. Thus, CatalogStore need to provide partition directories 
for specified filter conditions. I designed new method to CatalogStore as 
follows:

* method name: getPartitionsWithConditionFilters
* first parameter: database name
* second parameter: table name
* third parameter: where clause (included target column name and partition 
value)
* return values: 
List<org.apache.tajo.catalog.proto.CatalogProtos.TablePartitionProto>
* description: It scan right partition directories on CatalogStore with where 
caluse. 
  For examples, users set parameters as following:
** first parameter: default
** second parameter: table1
** third parameter: COLUMN_NAME = 'col1' AND PARTITION_VALUE = '3

In the previous cases, this method will create select clause as follows.

{code:xml}
SELECT DISTINCT A.PATH
FROM PARTITIONS A, (
  SELECT B.PARTITION_ID
  FROM PARTITION_KEYS B
  WHERE B.PARTITION_ID > 0 
  AND (
    COLUMN_NAME = 'col1' AND PARTITION_VALUE = '3'
  )
) B
WHERE A.PARTITION_ID > 0
AND A.TID = ${table_id}
AND A.PARTITION_ID = B.PARTITION_ID
{code}

At the first time, I considered to use EvalNode instead of where clause. But I 
can’t use it because of recursive related problems between tajo-catalog module 
and tajo-plan module. So, I’ll implement utility class to convert EvalNode to 
SQL.

  was:
Currently, PartitionedTableRewriter take a look into partition directories for 
rewriting filter conditions. It get all sub directories of table path because 
catalog doesn’t provide partition directories. But if there are lots of sub 
directories on HDFS, such as, more than 10,000 directories, it might be cause 
overload to NameNode. Thus, CatalogStore need to provide partition directories 
for specified filter conditions. I designed new method to CatalogStore as 
follows:

* method name: getPartitionsWithConditionFilters
* first parameter: database name
* second parameter: table name
* third parameter: where clause (included target column name and partition 
value)
* return values: 
List<org.apache.tajo.catalog.proto.CatalogProtos.TablePartitionProto>
* description: It scan right partition directories on CatalogStore with where 
caluse. 
  For examples, users set parameters as following:
** first parameter: default
** second parameter: table1
** third parameter: COLUMN_NAME = 'col1' AND PARTITION_VALUE = '3

In the previous cases, this method will create select clause as follows.

{code:xml}
SELECT DISTINCT A.PATH
FROM PARTITIONS A, (
  SELECT B.PID
  FROM PARTITION_KEYS B
  WHERE B.PID > 0 
  AND (
    COLUMN_NAME = 'col1' AND PARTITION_VALUE = '3'
  )
) B
WHERE A.PID > 0
AND A.TID = ${table_id}
AND A.PID = B.PID
{code}

At the first time, I considered to use EvalNode instead of where clause. But I 
can’t use it because of recursive related problems between tajo-catalog module 
and tajo-plan module. So, I’ll implement utility class to convert EvalNode to 
SQL.


> Add a method to get partition directories with filter conditions.
> -----------------------------------------------------------------
>
>                 Key: TAJO-1493
>                 URL: https://issues.apache.org/jira/browse/TAJO-1493
>             Project: Tajo
>          Issue Type: Sub-task
>          Components: catalog
>            Reporter: Jaehwa Jung
>            Assignee: Jaehwa Jung
>
> Currently, PartitionedTableRewriter take a look into partition directories 
> for rewriting filter conditions. It get all sub directories of table path 
> because catalog doesn’t provide partition directories. But if there are lots 
> of sub directories on HDFS, such as, more than 10,000 directories, it might 
> be cause overload to NameNode. Thus, CatalogStore need to provide partition 
> directories for specified filter conditions. I designed new method to 
> CatalogStore as follows:
> * method name: getPartitionsWithConditionFilters
> * first parameter: database name
> * second parameter: table name
> * third parameter: where clause (included target column name and partition 
> value)
> * return values: 
> List<org.apache.tajo.catalog.proto.CatalogProtos.TablePartitionProto>
> * description: It scan right partition directories on CatalogStore with where 
> caluse. 
>   For examples, users set parameters as following:
> ** first parameter: default
> ** second parameter: table1
> ** third parameter: COLUMN_NAME = 'col1' AND PARTITION_VALUE = '3
> In the previous cases, this method will create select clause as follows.
> {code:xml}
> SELECT DISTINCT A.PATH
> FROM PARTITIONS A, (
>   SELECT B.PARTITION_ID
>   FROM PARTITION_KEYS B
>   WHERE B.PARTITION_ID > 0 
>   AND (
>     COLUMN_NAME = 'col1' AND PARTITION_VALUE = '3'
>   )
> ) B
> WHERE A.PARTITION_ID > 0
> AND A.TID = ${table_id}
> AND A.PARTITION_ID = B.PARTITION_ID
> {code}
> At the first time, I considered to use EvalNode instead of where clause. But 
> I can’t use it because of recursive related problems between tajo-catalog 
> module and tajo-plan module. So, I’ll implement utility class to convert 
> EvalNode to SQL.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TAJO-1493) Add a method to get partition directories with filter conditions.

Reply via email to