I haven't looked at the code, but I assume the query parser would sort
the 'in' terms and then do a binary search lookup into them for each
row, while the 'or' terms don't have that kind of obvious relationship
and are probably tested in sequence.  This would give the in O(log N)
performance compared to a chain of or's having O(N) performance, per
row queried.  For large N, that could add up.  That being said, I'm
just speculating here.  The query parser may be smart enough to
optimize the related or's in the same way, or it may not optimize that
at all.  If I get a chance, I'll try to dig around and see what it's
doing, as I have also had a lot of large 'in' queries and could use
every drop of performance I can get.

--Mark

On Wed, Aug 4, 2010 at 9:47 AM, Edward Capriolo <[email protected]> wrote:
> On Wed, Aug 4, 2010 at 6:10 AM, lei liu <[email protected]> wrote:
>> Because my company reuire we use 0.4.1 version, the version don't support IN
>> clause. I want to  use the OR clause(example:where id=1 or id=2 or id=3) to
>> implement the IN clause(example: id in(1,2,3) ).  I know it will be slower
>> especially when the list after "in" is very long.  Could anybody can tell me
>> why is slow when use OR clause to implement In clause?
>>
>>
>> Thanks,
>>
>>
>> LiuLei
>>
>
> I can not imagine the performance difference between 'or' or 'in'
> would be that great but I never benchmarked it. The big looming
> problems is that if you string enough 'or' together (say 8000) the
> query parser which uses java beans serialization will OOM.
>
> Edward
>

Reply via email to