+1 to implementing IN feature instead. We are also looking for IN / NOT-IN cases where the inclusion/exclusion set is very large.
-Gautam Sent from my iPhone > On Mar 6, 2019, at 5:38 PM, Anton Okolnychyi <aokolnyc...@apple.com.invalid> > wrote: > > For some reason, I thought there was a blocker there. As Iceberg is not using > org.apache.parquet.filter2.predicate.FilterApi in its Parquet reader then > makes sense to fix, of course. > >> On 5 Mar 2019, at 18:38, Ryan Blue <rb...@netflix.com.INVALID> wrote: >> >> Would it make sense to add support for IN expressions instead? I'd rather >> get that done than build work-arounds. >> >> On Tue, Mar 5, 2019 at 10:33 AM Anton Okolnychyi >> <aokolnyc...@apple.com.invalid> wrote: >>> Hey, >>> >>> Iceberg Spark data source rewrites IN predicates as a mix of OR/EQ. I am >>> wondering if it makes sense to introduce a threshold when this rewrite >>> happens until [1] is resolved. We can have something similar to >>> “spark.sql.parquet.pushdown.inFilterThreshold” in Spark. >>> >>> We have experienced a performance degradation on a few queries. One of the >>> queries had 5 predicates and 2 of them were IN. In this specific case, IN >>> predicates didn’t help to filter out files and just made the overall row >>> filter more complicated. >>> >>> Thanks, >>> Anton >>> >>> >>> [1] - https://github.com/apache/incubator-iceberg/issues/39 >>> >> >> >> -- >> Ryan Blue >> Software Engineer >> Netflix >