Re: IN Query Performance

Emily Fri, 02 Sep 2011 15:01:49 -0700

Thanks for the example Peter.  It very interesting.

However, after playing around with this example and thinking about it
more I'm not sure it's incorrect behavior in all cases.  In your
example (and my query) the function returns a set of results that is
always the same regardless of what row is being returned in the outer
query.  However, if the function relied on a value from the outer
query, something like:
SELECT a.id, a.value FROM outer a WHERE a.id IN (SELECT inner_id FROM
test_function(a.value) )
then I think the database will need to evaluate the function for each
row returned as the results.


I tried making the test function you provided a DETERMINISTIC
function.  I'm not sure this is the correct use of deterministic, but
by doing so the function is called only once (not once for each row
returned) and some performance improvements are had:

DETERMINISTIC function:
Rows   QueryTime
1366    0.82 sec
2389    2.67 sec
5635    13.6 sec

NON-DETERMINISTIC function:
Rows   QueryTime
1366    1.6 sec
2389    5 sec
5635    29 sec

However it is nowhere near the performance you get if you do this as
two queries:
CREATE TABLE RESULTS_TEMP AS SELECT ID FROM PUBLIC.IN_LIST_TEST(5635)
SELECT * FROM IN_LIST_TABLE WHERE ID IN (SELECT ID FROM RESULTS_TEMP)

These two statements take ~0.15 seconds.

Maybe I'll just have to settle with re-writing my queries and avoiding
the use of IN.

Thanks,
Emily

On Sep 1, 7:34 pm, Peter Yuill <[email protected]> wrote:
> Hi Emily,
>
>
>
>
>
>
>
>
>
> > The explain for the first query is:
>
> > SELECT
> >      ID,
> >      "geom" AS "geom"
> > FROM PUBLIC.TESTDATA
> >      /* PUBLIC.PRIMARY_KEY_3: ID IN(SELECT
> >          CAST(HATBOX_JOIN_ID AS INTEGER)
> >      FROM PUBLIC.HATBOX_MBR_INTERSECTS_ENV('PUBLIC', 'TESTDATA',
> > -139.17094658129298, -113.94051433313055, 44.17730198523503,
> > 64.12298153276885)
> >          /++ function ++/)
> >       */
> > WHERE ID IN(
> >      SELECT
> >          CAST(HATBOX_JOIN_ID AS INTEGER)
> >      FROM PUBLIC.HATBOX_MBR_INTERSECTS_ENV('PUBLIC', 'TESTDATA',
> > -139.17094658129298, -113.94051433313055, 44.17730198523503,
> > 64.12298153276885)
> >          /* function */)
>
>  From your explain plan it looks to me that the plan is as good as it can be. 
> I discovered that my initial test did not select enough rows into the IN 
> list. Here is a short table of IN list size and query time on my machine:
>
> Rows  QueryTime
> 5635   35s
> 2389   7s
> 1366   2s
>
> There is no disk caching here either, it is all cpu. This says to me that 
> there is a problem with the algorithm used for IN parsing. A brief look into 
> the code found that a function in an IN list is called for every row returned 
> and the resultset is fully traversed for every row returned. This looks like 
> a bug to me, and I have attached a test program to demonstrate the issue.
>
> Regards,
> Peter
>
>  InListTest.java
> 28KViewDownload

-- 
You received this message because you are subscribed to the Google Groups "H2 
Database" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/h2-database?hl=en.

Re: IN Query Performance

Reply via email to