On Wed, Aug 4, 2010 at 1:15 PM, Ning Zhang <[email protected]> wrote: > Currently an expression tree (series of ORs in this case) is not collapsed to > one operator or any other optimizations. It would be great to have this > optimization rule to convert an OR operator tree to one IN operator. Would > you be able to file a JIRA and contribute a patch? > > On Aug 4, 2010, at 7:46 AM, Mark Tozzi wrote: > >> I haven't looked at the code, but I assume the query parser would sort >> the 'in' terms and then do a binary search lookup into them for each >> row, while the 'or' terms don't have that kind of obvious relationship >> and are probably tested in sequence. This would give the in O(log N) >> performance compared to a chain of or's having O(N) performance, per >> row queried. For large N, that could add up. That being said, I'm >> just speculating here. The query parser may be smart enough to >> optimize the related or's in the same way, or it may not optimize that >> at all. If I get a chance, I'll try to dig around and see what it's >> doing, as I have also had a lot of large 'in' queries and could use >> every drop of performance I can get. >> >> --Mark >> >> On Wed, Aug 4, 2010 at 9:47 AM, Edward Capriolo <[email protected]> >> wrote: >>> On Wed, Aug 4, 2010 at 6:10 AM, lei liu <[email protected]> wrote: >>>> Because my company reuire we use 0.4.1 version, the version don't support >>>> IN >>>> clause. I want to use the OR clause(example:where id=1 or id=2 or id=3) to >>>> implement the IN clause(example: id in(1,2,3) ). I know it will be slower >>>> especially when the list after "in" is very long. Could anybody can tell >>>> me >>>> why is slow when use OR clause to implement In clause? >>>> >>>> >>>> Thanks, >>>> >>>> >>>> LiuLei >>>> >>> >>> I can not imagine the performance difference between 'or' or 'in' >>> would be that great but I never benchmarked it. The big looming >>> problems is that if you string enough 'or' together (say 8000) the >>> query parser which uses java beans serialization will OOM. >>> >>> Edward >>> > >
For reference I did this as a test case.... SELECT * FROM src where key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR ...(100 more of these) No OOM but I gave up after the test case did not go anywhere for about 2 minutes. Edward
