+1 to implementing IN feature instead. We are also looking for IN / NOT-IN 
cases where the inclusion/exclusion set is very large. 


-Gautam

Sent from my iPhone

> On Mar 6, 2019, at 5:38 PM, Anton Okolnychyi <aokolnyc...@apple.com.invalid> 
> wrote:
> 
> For some reason, I thought there was a blocker there. As Iceberg is not using 
> org.apache.parquet.filter2.predicate.FilterApi in its Parquet reader then 
> makes sense to fix, of course.
> 
>> On 5 Mar 2019, at 18:38, Ryan Blue <rb...@netflix.com.INVALID> wrote:
>> 
>> Would it make sense to add support for IN expressions instead? I'd rather 
>> get that done than build work-arounds.
>> 
>> On Tue, Mar 5, 2019 at 10:33 AM Anton Okolnychyi 
>> <aokolnyc...@apple.com.invalid> wrote:
>>> Hey,
>>> 
>>> Iceberg Spark data source rewrites IN predicates as a mix of OR/EQ. I am 
>>> wondering if it makes sense to introduce a threshold when this rewrite 
>>> happens until [1] is resolved. We can have something similar to 
>>> “spark.sql.parquet.pushdown.inFilterThreshold” in Spark.
>>> 
>>> We have experienced a performance degradation on a few queries. One of the 
>>> queries had 5 predicates and 2 of them were IN. In this specific case, IN 
>>> predicates didn’t help to filter out files and just made the overall row 
>>> filter more complicated.
>>> 
>>> Thanks,
>>> Anton
>>> 
>>> 
>>> [1] - https://github.com/apache/incubator-iceberg/issues/39
>>> 
>> 
>> 
>> -- 
>> Ryan Blue
>> Software Engineer
>> Netflix
> 

Reply via email to