Re: Option to disable rewrites of IN predicates

Anton Okolnychyi Wed, 06 Mar 2019 04:08:45 -0800

For some reason, I thought there was a blocker there. As Iceberg is not using 
org.apache.parquet.filter2.predicate.FilterApi in its Parquet reader then makes 
sense to fix, of course.


> On 5 Mar 2019, at 18:38, Ryan Blue <rb...@netflix.com.INVALID> wrote:
> 
> Would it make sense to add support for IN expressions instead? I'd rather get 
> that done than build work-arounds.
> 
> On Tue, Mar 5, 2019 at 10:33 AM Anton Okolnychyi 
> <aokolnyc...@apple.com.invalid> wrote:
> Hey,
> 
> Iceberg Spark data source rewrites IN predicates as a mix of OR/EQ. I am 
> wondering if it makes sense to introduce a threshold when this rewrite 
> happens until [1] is resolved. We can have something similar to 
> “spark.sql.parquet.pushdown.inFilterThreshold” in Spark.
> 
> We have experienced a performance degradation on a few queries. One of the 
> queries had 5 predicates and 2 of them were IN. In this specific case, IN 
> predicates didn’t help to filter out files and just made the overall row 
> filter more complicated.
> 
> Thanks,
> Anton
> 
> 
> [1] - https://github.com/apache/incubator-iceberg/issues/39 
> <https://github.com/apache/incubator-iceberg/issues/39>
> 
> 
> 
> -- 
> Ryan Blue
> Software Engineer
> Netflix

Re: Option to disable rewrites of IN predicates

Reply via email to