On Wed, Aug 4, 2010 at 1:15 PM, Ning Zhang <[email protected]> wrote:
> Currently an expression tree (series of ORs in this case) is not collapsed to 
> one operator or any other optimizations. It would be great to have this 
> optimization rule to convert an OR operator tree to one IN operator. Would 
> you be able to file a JIRA and contribute a patch?
>
> On Aug 4, 2010, at 7:46 AM, Mark Tozzi wrote:
>
>> I haven't looked at the code, but I assume the query parser would sort
>> the 'in' terms and then do a binary search lookup into them for each
>> row, while the 'or' terms don't have that kind of obvious relationship
>> and are probably tested in sequence.  This would give the in O(log N)
>> performance compared to a chain of or's having O(N) performance, per
>> row queried.  For large N, that could add up.  That being said, I'm
>> just speculating here.  The query parser may be smart enough to
>> optimize the related or's in the same way, or it may not optimize that
>> at all.  If I get a chance, I'll try to dig around and see what it's
>> doing, as I have also had a lot of large 'in' queries and could use
>> every drop of performance I can get.
>>
>> --Mark
>>
>> On Wed, Aug 4, 2010 at 9:47 AM, Edward Capriolo <[email protected]> 
>> wrote:
>>> On Wed, Aug 4, 2010 at 6:10 AM, lei liu <[email protected]> wrote:
>>>> Because my company reuire we use 0.4.1 version, the version don't support 
>>>> IN
>>>> clause. I want to  use the OR clause(example:where id=1 or id=2 or id=3) to
>>>> implement the IN clause(example: id in(1,2,3) ).  I know it will be slower
>>>> especially when the list after "in" is very long.  Could anybody can tell 
>>>> me
>>>> why is slow when use OR clause to implement In clause?
>>>>
>>>>
>>>> Thanks,
>>>>
>>>>
>>>> LiuLei
>>>>
>>>
>>> I can not imagine the performance difference between 'or' or 'in'
>>> would be that great but I never benchmarked it. The big looming
>>> problems is that if you string enough 'or' together (say 8000) the
>>> query parser which uses java beans serialization will OOM.
>>>
>>> Edward
>>>
>
>

For reference I did this as a test case....
SELECT * FROM src where
key=0 OR key=0 OR key=0 OR  key=0 OR key=0 OR key=0 OR key=0 OR key=0
OR key=0 OR key=0 OR key=0 OR
key=0 OR key=0 OR key=0 OR  key=0 OR key=0 OR key=0 OR key=0 OR key=0
OR key=0 OR key=0 OR key=0 OR
...(100 more of these)

No OOM but I gave up after the test case did not go anywhere for about
2 minutes.

Edward

Reply via email to