And a test case too. Awesome! I will review.
> On Jul 30, 2015, at 7:53 AM, Josh Wills <[email protected]> wrote: > > As a follow-up for this, I posted a patch to fix the problem here: > https://issues.apache.org/jira/browse/CALCITE-824 > > On Tue, Jul 28, 2015 at 7:13 PM, Josh Wills <[email protected]> wrote: > >> Hey devs, >> >> Found a fun issue for you that I couldn't find anywhere else after some >> googling. >> >> Let's say I have a BIGINT column named "q" in a table, and I want to do >> some filtering on a list of values from that column based on an IN clause. >> If all of the values in my IN clause happen to be INTs, then my query will >> work properly (i.e., it will return any rows from the table whose "q" value >> contains one of the values from my list) if the IN clause contains less >> than 20 values, but as soon as the IN clause contains more than 20 values, >> the query will always return zero rows. >> >> I did some explains and can see that the planner changes the strategy for >> the query execution when the number of values in an IN clause exceeds 20-- >> less than 20 generates a big (q = 1 OR q = 2 OR ...) statement, whereas >= >> 20 switches to a SemiJoin between the original table and an >> EnumerableValues object that contains the list of values from the IN >> clause. I think the rub here is that if the values in the IN clause look >> like INTs or at least one of them isn't explicitly cast to a BIGINT, then >> the EnumerableValues will also have type int, and the semi-join will fail >> b/c of the type difference (that last part is me hypothesizing, you all >> would obviously know better what the problem is.) >> >> I can get around the issue by either casting one of the values in the IN >> clause to be a BIGINT, or by explicitly including a value in the list that >> can't be stored as an INT, but it seemed like there should be a cleaner way >> to handle this case so that the query behavior doesn't change dramatically >> when one extra value gets added to the IN list. >> >> Thanks! >> Josh >> >> >> -- >> Director of Data Science >> Cloudera <http://www.cloudera.com> >> Twitter: @josh_wills <http://twitter.com/josh_wills> >> > > > > -- > Director of Data Science > Cloudera <http://www.cloudera.com> > Twitter: @josh_wills <http://twitter.com/josh_wills>
