Jaehwa Jung created TAJO-1493:
---------------------------------
Summary: Added a method to get partition directories with filter
conditions.
Key: TAJO-1493
URL: https://issues.apache.org/jira/browse/TAJO-1493
Project: Tajo
Issue Type: Sub-task
Components: catalog
Reporter: Jaehwa Jung
Assignee: Jaehwa Jung
Currently, PartitionedTableRewriter take a look into partition directories for
rewriting filter conditions. It get all sub directories of table path because
catalog doesn’t provide partition directories. But if there are lots of sub
directories on HDFS, such as, more than 10,000 directories, it might be cause
overload to NameNode. Thus, CatalogStore need to provide partition directories
for specified filter conditions. I designed new method to CatalogStore as
follows:
* method name: getPartitionsWithConditionFilters
* first parameter: database name
* second parameter: table name
* third parameter: where clause (included target column name and partition
value)
* return values:
List<org.apache.tajo.catalog.proto.CatalogProtos.TablePartitionProto>
* description: It scan right partition directories on CatalogStore with where
caluse. For examples, users set parameters as following:
** first parameter: default
** second parameter: table1
** third parameter: COLUMN_NAME = 'col1' AND PARTITION_VALUE = '3
In the previous cases, this method will create select clause as follows.
{code:xml}
SELECT DISTINCT A.PATH
FROM PARTITIONS A, (
SELECT B.PID
FROM PARTITION_KEYS B
WHERE B.PID > 0
AND (
COLUMN_NAME = 'col1' AND PARTITION_VALUE = '3'
)
) B
WHERE A.PID > 0
AND A.TID = ${table_id}
AND A.PID = B.PID
{code}
At the first time, I considered to use EvalNode instead of where clause. But I
can’t use it because of recursive related problems between tajo-catalog module
and tajo-plan module. So, I’ll implement utility class to convert EvalNode to
SQL.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)