I haven't looked at the code, but I assume the query parser would sort the 'in' terms and then do a binary search lookup into them for each row, while the 'or' terms don't have that kind of obvious relationship and are probably tested in sequence. This would give the in O(log N) performance compared to a chain of or's having O(N) performance, per row queried. For large N, that could add up. That being said, I'm just speculating here. The query parser may be smart enough to optimize the related or's in the same way, or it may not optimize that at all. If I get a chance, I'll try to dig around and see what it's doing, as I have also had a lot of large 'in' queries and could use every drop of performance I can get.
--Mark On Wed, Aug 4, 2010 at 9:47 AM, Edward Capriolo <[email protected]> wrote: > On Wed, Aug 4, 2010 at 6:10 AM, lei liu <[email protected]> wrote: >> Because my company reuire we use 0.4.1 version, the version don't support IN >> clause. I want to use the OR clause(example:where id=1 or id=2 or id=3) to >> implement the IN clause(example: id in(1,2,3) ). I know it will be slower >> especially when the list after "in" is very long. Could anybody can tell me >> why is slow when use OR clause to implement In clause? >> >> >> Thanks, >> >> >> LiuLei >> > > I can not imagine the performance difference between 'or' or 'in' > would be that great but I never benchmarked it. The big looming > problems is that if you string enough 'or' together (say 8000) the > query parser which uses java beans serialization will OOM. > > Edward >
