Thanks for the example Peter. It very interesting. However, after playing around with this example and thinking about it more I'm not sure it's incorrect behavior in all cases. In your example (and my query) the function returns a set of results that is always the same regardless of what row is being returned in the outer query. However, if the function relied on a value from the outer query, something like: SELECT a.id, a.value FROM outer a WHERE a.id IN (SELECT inner_id FROM test_function(a.value) ) then I think the database will need to evaluate the function for each row returned as the results.
I tried making the test function you provided a DETERMINISTIC function. I'm not sure this is the correct use of deterministic, but by doing so the function is called only once (not once for each row returned) and some performance improvements are had: DETERMINISTIC function: Rows QueryTime 1366 0.82 sec 2389 2.67 sec 5635 13.6 sec NON-DETERMINISTIC function: Rows QueryTime 1366 1.6 sec 2389 5 sec 5635 29 sec However it is nowhere near the performance you get if you do this as two queries: CREATE TABLE RESULTS_TEMP AS SELECT ID FROM PUBLIC.IN_LIST_TEST(5635) SELECT * FROM IN_LIST_TABLE WHERE ID IN (SELECT ID FROM RESULTS_TEMP) These two statements take ~0.15 seconds. Maybe I'll just have to settle with re-writing my queries and avoiding the use of IN. Thanks, Emily On Sep 1, 7:34 pm, Peter Yuill <[email protected]> wrote: > Hi Emily, > > > > > > > > > > > The explain for the first query is: > > > SELECT > > ID, > > "geom" AS "geom" > > FROM PUBLIC.TESTDATA > > /* PUBLIC.PRIMARY_KEY_3: ID IN(SELECT > > CAST(HATBOX_JOIN_ID AS INTEGER) > > FROM PUBLIC.HATBOX_MBR_INTERSECTS_ENV('PUBLIC', 'TESTDATA', > > -139.17094658129298, -113.94051433313055, 44.17730198523503, > > 64.12298153276885) > > /++ function ++/) > > */ > > WHERE ID IN( > > SELECT > > CAST(HATBOX_JOIN_ID AS INTEGER) > > FROM PUBLIC.HATBOX_MBR_INTERSECTS_ENV('PUBLIC', 'TESTDATA', > > -139.17094658129298, -113.94051433313055, 44.17730198523503, > > 64.12298153276885) > > /* function */) > > From your explain plan it looks to me that the plan is as good as it can be. > I discovered that my initial test did not select enough rows into the IN > list. Here is a short table of IN list size and query time on my machine: > > Rows QueryTime > 5635 35s > 2389 7s > 1366 2s > > There is no disk caching here either, it is all cpu. This says to me that > there is a problem with the algorithm used for IN parsing. A brief look into > the code found that a function in an IN list is called for every row returned > and the resultset is fully traversed for every row returned. This looks like > a bug to me, and I have attached a test program to demonstrate the issue. > > Regards, > Peter > > InListTest.java > 28KViewDownload -- You received this message because you are subscribed to the Google Groups "H2 Database" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/h2-database?hl=en.
