We need a new interface for predicate pushdown. Current interface only
support partition pushdown which consume the filter condition into
loader. AFAIK, ORC predicate pushdown only support simple types. I
discussed briefly with Aniket before, and we are open to the choice of
interface design. There is no Jira yet we do need to create one.

Thanks,
Daniel

On Thu, Apr 24, 2014 at 3:21 PM, Rohini Palaniswamy
<[email protected]> wrote:
> Hi,
>    Both Parquet and ORC both support predicate pushdown. Was looking at
> whether we can make use of the existing PartitionFilterOptimizer and report
> whether columns supported for predicate pushdown can be reported as
> partition columns. Dmitriy was talking about the  PartitionFilterOptimizer
> pushing down the filter conditions to the LoadFunc but not removing them
> from the actual filter condition. But even the new FilterExtractor (and old
> PColFilterExtractor) that Aniket wrote removes the filter condition pushed
> down. And in a way it makes sense for HCat when you filter lot of
> partitions, you don't want each record also again filtered for the
> partition condition wasting CPU. But in case of columnar file formats, the
> predicates pushed down is only for selection/skipping of row groups/stripes
> and not answering actual queries. So we need a new optimizer for pushing
> down predicates to file formats which does not remove the filter condition
> and a new Load interface.
>
>  There are no jiras filed for this yet. Will file one soon. Has anyone
> already given thought to this and have any API design in mind? We are
> planning to work on this and the main focus is on ORCFile, but want to
> ensure that we address all cases of Parquet as well. Julien/Aniket could
> you help with any questions on the Parquet front?
>
> ORCFile pushes down filter predicates using indexes/column sorting,
> dictionary sorting or bloom filters according to
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ORC. I
> don't think it can push down filters for complex data structures like list
> or maps. Daniel, can you confirm?
>
> Julien,
>    Can you tell how predicate pushdown works with Parquet. Does it support
> map columns? I could not find much documentation on it.
>
> Regards,
> Rohini

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Reply via email to