[jira] [Commented] (FLINK-5859) support partition pruning on Table API & SQL

Fabian Hueske (JIRA) Fri, 24 Feb 2017 02:53:53 -0800

    [ 
https://issues.apache.org/jira/browse/FLINK-5859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15882446#comment-15882446
 ]


Fabian Hueske commented on FLINK-5859:
--------------------------------------

Hi [~godfreyhe] and [~ykt836],

I think we can also get a nice API if we handle both cases as regular filter 
push down.
If we implement {{PartitionableTableSource}} as follows:

{code}
abstract class PartitionableTableSource extends FilterableTableSource {

  // This needs to be implemented!
  def getAllPartitions: String[]

  // This needs to be implemented
  // Interface can also be easier and not use Expression
  def applyPartitionPruning(partitionsToPrune: Array[Expression]): Unit

  // Default implementation. Must be overridden to apply filter in addition to 
partition pruning.
  //   If overridden, it will be called when partitions have already be pruned.
  //   -> If it needs to scan meta data, it knows which partitions to skip.
  def applyPredicate(predicate: Array[Expression]): Array[Expression] = {
    // by default returns all predicates
    predicate
  }

  // Default implementation. Will be called by PushDownFilterRule
  override def setPredicate(predicates: Array[Expression]): Array[Expression] = 
{

    // identify which partitions exist
    val partitions = getAllPartitions
    // go over predicate expressions and identify how partition pruning can be 
applied
    val (partitionsToPrune, remaining): (Array[Expression], Array[Expression]) 
= predicates.foreach(???)
    // set partitions to prune
    applyPartitionPruning(partitionsToPrune)
    
    // apply remaining predicates
    val remainingAfterFilter = applyPredicate(remaining)
    
    remainingAfterFilter
  }

}
{code}

This approach is fully integrated with the {{FilterableTableSource}} and does 
not require any additional logic in the optimizer (no rules, etc.).
If only partition pruning should be done, only {{getAllPartitions}} and 
{{applyPartitionPruning}} need to be implemented. If the table source should 
also apply filters it needs to override {{applyPredicate()}}.
It also reduces the scan of metadata because partitions are pruned before the 
metadata for filters need to be checked.

Is there another benefit of applying PartitionPruning earlier?

> support partition pruning on Table API & SQL
> --------------------------------------------
>
>                 Key: FLINK-5859
>                 URL: https://issues.apache.org/jira/browse/FLINK-5859
>             Project: Flink
>          Issue Type: New Feature
>          Components: Table API & SQL
>            Reporter: godfrey he
>            Assignee: godfrey he
>
> Many data sources are partitionable storage, e.g. HDFS, Druid. And many 
> queries just need to read a small subset of the total data. We can use 
> partition information to prune or skip over files irrelevant to the user’s 
> queries. Both query optimization time and execution time can be reduced 
> obviously, especially for a large partitioned table.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (FLINK-5859) support partition pruning on Table API & SQL

Reply via email to