As a follow-up for this, I posted a patch to fix the problem here: https://issues.apache.org/jira/browse/CALCITE-824
On Tue, Jul 28, 2015 at 7:13 PM, Josh Wills <[email protected]> wrote: > Hey devs, > > Found a fun issue for you that I couldn't find anywhere else after some > googling. > > Let's say I have a BIGINT column named "q" in a table, and I want to do > some filtering on a list of values from that column based on an IN clause. > If all of the values in my IN clause happen to be INTs, then my query will > work properly (i.e., it will return any rows from the table whose "q" value > contains one of the values from my list) if the IN clause contains less > than 20 values, but as soon as the IN clause contains more than 20 values, > the query will always return zero rows. > > I did some explains and can see that the planner changes the strategy for > the query execution when the number of values in an IN clause exceeds 20-- > less than 20 generates a big (q = 1 OR q = 2 OR ...) statement, whereas >= > 20 switches to a SemiJoin between the original table and an > EnumerableValues object that contains the list of values from the IN > clause. I think the rub here is that if the values in the IN clause look > like INTs or at least one of them isn't explicitly cast to a BIGINT, then > the EnumerableValues will also have type int, and the semi-join will fail > b/c of the type difference (that last part is me hypothesizing, you all > would obviously know better what the problem is.) > > I can get around the issue by either casting one of the values in the IN > clause to be a BIGINT, or by explicitly including a value in the list that > can't be stored as an INT, but it seemed like there should be a cleaner way > to handle this case so that the query behavior doesn't change dramatically > when one extra value gets added to the IN list. > > Thanks! > Josh > > > -- > Director of Data Science > Cloudera <http://www.cloudera.com> > Twitter: @josh_wills <http://twitter.com/josh_wills> > -- Director of Data Science Cloudera <http://www.cloudera.com> Twitter: @josh_wills <http://twitter.com/josh_wills>
