[jira] [Commented] (TAJO-1493) Make partition pruning based on catalog informations

ASF GitHub Bot (JIRA) Tue, 22 Sep 2015 07:41:11 -0700

    [ 
https://issues.apache.org/jira/browse/TAJO-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14902730#comment-14902730
 ]


ASF GitHub Bot commented on TAJO-1493:
--------------------------------------

Github user jihoonson commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/772#discussion_r40093853
  
    --- Diff: 
tajo-catalog/tajo-catalog-drivers/tajo-hive/src/main/java/org/apache/tajo/catalog/store/HiveCatalogStore.java
 ---
    @@ -845,13 +856,174 @@ public boolean existPartitionMethod(String 
databaseName, String tableName) throw
       }
     
       @Override
    -  public List<CatalogProtos.PartitionDescProto> getPartitions(String 
databaseName,
    -                                                         String tableName) 
{
    -    throw new UnsupportedOperationException();
    +  public List<CatalogProtos.PartitionDescProto> 
getPartitionsOfTable(String databaseName, String tableName)
    +      throws UndefinedDatabaseException, UndefinedTableException, 
UndefinedPartitionMethodException {
    +    PartitionsByFilterProto.Builder request = 
PartitionsByFilterProto.newBuilder();
    +    request.setDatabaseName(databaseName);
    +    request.setTableName(tableName);
    +    request.setFilter("");
    +
    +    return getPartitionsByFilter(request.build());
    +  }
    +
    +  @Override
    +  public List<PartitionDescProto> 
getPartitionsByAlgebra(PartitionsByAlgebraProto request) throws
    +    UndefinedDatabaseException, UndefinedTableException, 
UndefinedPartitionMethodException, UnsupportedException {
    +
    +    List<PartitionDescProto> list = null;
    +
    +    try {
    +      String databaseName = request.getDatabaseName();
    +      String tableName = request.getTableName();
    +
    +      if (!existDatabase(databaseName)) {
    +        throw new UndefinedDatabaseException(tableName);
    +      }
    +
    +      if (!existTable(databaseName, tableName)) {
    +        throw new UndefinedTableException(tableName);
    +      }
    +
    +      if (!existPartitionMethod(databaseName, tableName)) {
    +        throw new UndefinedPartitionMethodException(tableName);
    +      }
    +
    +      TableDescProto tableDesc = getTable(databaseName, tableName);
    +      String filter = getFilter(databaseName, tableName, 
tableDesc.getPartition().getExpressionSchema().getFieldsList()
    +        , request.getAlgebra());
    +      list = getPartitionsByFilterFromHiveMetaStore(databaseName, 
tableName, filter);
    +    } catch (UnsupportedException ue) {
    +      throw ue;
    +    } catch (Exception se) {
    +      throw new TajoInternalError(se);
    +    }
    +
    +    return list;
    +  }
    +
    +  private String getFilter(String databaseName, String tableName, 
List<ColumnProto> partitionColumns
    +      , String json) throws TajoException {
    +
    +    Expr[] exprs = null;
    +
    +    if (json != null && !json.isEmpty()) {
    +      Expr algebra = JsonHelper.fromJson(json, Expr.class);
    +      exprs = AlgebraicUtil.toConjunctiveNormalFormArray(algebra);
    +    }
    +
    +    PartitionFilterAlgebraVisitor visitor = new 
PartitionFilterAlgebraVisitor();
    +    visitor.setIsHiveCatalog(true);
    +
    +    Expr[] filters = 
AlgebraicUtil.getRearrangedCNFExpressions(databaseName + "." + tableName, 
partitionColumns, exprs);
    +
    +    StringBuffer sb = new StringBuffer();
    +
    +    // Write join clause from second column to last column.
    +    Column target;
    +
    +    int addedFilter = 0;
    +    String result;
    +    for (int i = 0; i < partitionColumns.size(); i++) {
    +      target = new Column(partitionColumns.get(i));
    +
    +      if (!(filters[i] instanceof IsNullPredicate)) {
    +        visitor.setColumn(target);
    +        visitor.visit(null, new Stack<Expr>(), filters[i]);
    +        result = visitor.getResult();
    +
    +        // If visitor build filter successfully, add filter to be used for 
executing hive api.
    +        if (result.length() > 0) {
    +          if (addedFilter > 0) {
    --- End diff --
    
    ```addedFilter``` can be replaced with sb.length().


> Make partition pruning based on catalog informations
> ----------------------------------------------------
>
>                 Key: TAJO-1493
>                 URL: https://issues.apache.org/jira/browse/TAJO-1493
>             Project: Tajo
>          Issue Type: Sub-task
>          Components: Catalog, Planner/Optimizer
>            Reporter: Jaehwa Jung
>            Assignee: Jaehwa Jung
>             Fix For: 0.11.0, 0.12.0
>
>         Attachments: TAJO-1493.patch, TAJO-1493_2.patch, TAJO-1493_3.patch, 
> TAJO-1493_4.patch, TAJO-1493_5.patch
>
>
> Currently, PartitionedTableRewriter take a look into partition directories 
> for rewriting filter conditions. It get all sub directories of table path 
> because catalog doesn’t provide partition directories. But if there are lots 
> of sub directories on HDFS, such as, more than 10,000 directories, it might 
> be cause overload to NameNode. Thus, CatalogStore need to provide partition 
> directories for specified filter conditions. I designed new method to 
> CatalogStore as follows:
> * method name: getPartitionsWithConditionFilters
> * first parameter: database name
> * second parameter: table name
> * third parameter: where clause (included target column name and partition 
> value)
> * return values: 
> List<org.apache.tajo.catalog.proto.CatalogProtos.TablePartitionProto>
> * description: It scan right partition directories on CatalogStore with where 
> caluse. 
>   For examples, users set parameters as following:
> ** first parameter: default
> ** second parameter: table1
> ** third parameter: COLUMN_NAME = 'col1' AND PARTITION_VALUE = '3
> In the previous cases, this method will create select clause as follows.
> {code:xml}
> SELECT DISTINCT A.PATH
> FROM PARTITIONS A, (
>   SELECT B.PARTITION_ID
>   FROM PARTITION_KEYS B
>   WHERE B.PARTITION_ID > 0 
>   AND (
>     COLUMN_NAME = 'col1' AND PARTITION_VALUE = '3'
>   )
> ) B
> WHERE A.PARTITION_ID > 0
> AND A.TID = ${table_id}
> AND A.PARTITION_ID = B.PARTITION_ID
> {code}
> At the first time, I considered to use EvalNode instead of where clause. But 
> I can’t use it because of recursive related problems between tajo-catalog 
> module and tajo-plan module. So, I’ll implement utility class to convert 
> EvalNode to SQL.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TAJO-1493) Make partition pruning based on catalog informations

Reply via email to