[ http://issues.apache.org/jira/browse/DERBY-781?page=all ]

A B updated DERBY-781:
----------------------

    Derby Info: [Patch Available, Release Note Needed]  (was: [Patch Available])

Possible RELEASE NOTE for this fix is as follows, based on suggestions from 
Bryan in the above-referenced thread:

<begin_release_note>

DERBY-781: Materialize subqueries in select list where possible to avoid 
creating invariant resultsets many times.

The Derby optimizer has been enhanced so that it now considers the cost of 
performing a hash join with subqueries when it is safe to do so.  If the cost 
of the hash join is better than a nested loop join, Derby will choose to do the 
hash join and will thereby materialize the subquery.

WHAT CHANGED

When optimizing a query that has one or more non-flattenable subqueries in the 
FROM clause, Derby will now check to see if it is possible to perform a hash 
join with that subquery as the inner table.  Prior to Derby 10.2, the optimizer 
would never consider a hash join with a subquery; it only did nested loop joins.

SYMPTOM

Execution performance of queries containing non-flattenable subqueries may 
change.  The expectation is that the new (10.2) query plans will show improved 
performance over the old ones.

Another potential symptom is that the compilation time for such queries may 
increase.  If this happens, the increase should only occur at compilation time; 
execution time should either improve or, at the very least, remain the same as 
in earlier versions of Derby.

CAUSE

If the optimizer chooses to do a hash join with a subquery, Derby only has to 
execute the subquery a single time per statement, after which Derby can just 
perform the desired join against the materialized result set.  Depending on how 
many rows are in the outer table of the join, this once-per-statement execution 
of the subquery can lead to major performance improvements over the 
once-per-outer-row execution employed by earlier versions of Derby.

As for the extra compilation time, this is due to the simple fact that the 
optimizer is now doing more work--i.e. in addition to considering nested loop 
joins with subqueries, it is now _also_ considering hash joins with those 
subqueries, and that means that it could potentially take longer for the 
optimizer to finish its work.  Note again that, if it occurs, the increased 
time should only occur at compilation time; execution time should either 
improve or, at the very least, remain the same as in earlier versions of Derby. 

SOLUTION 

This was an intentional change to improve the execution plans chosen by the 
optimizer for queries having large and/or complex subqueries.  The expectation 
is that the new behavior--and the subsequent query plans--will lead to improved 
performance over the old ones, so no further solution is required.

WORKAROUND

There is no way to disable/workaround this new behavior since the symptom as 
described above is a good one for Derby.

That said, any user who notices a negative performance change after moving to 
Derby 10.2, and who believes that the difference in performance is related to 
this optimizer enhancement, is encouraged to visit the following "performance 
diagnosis" page and to follow up with his/her findings on the Derby mailing 
lists:

        http://wiki.apache.org/db-derby/PerformanceDiagnosisTips

<end_release_note>

> Materialize union subqueries in select list where possible to avoid creating 
> invariant resultsets many times.
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: DERBY-781
>                 URL: http://issues.apache.org/jira/browse/DERBY-781
>             Project: Derby
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 10.1.1.0, 10.2.0.0
>         Environment: generic
>            Reporter: Satheesh Bandaram
>         Assigned To: A B
>         Attachments: d781_v1.patch, d781_v1.stat, d781_v2.patch, 
> DERBY-781_v1.html
>
>
> Derby's handling of union subqueries in from list can be improved by 
> materializing invariant resultsets once, rather than creating them many times.
> For example:
> create view V1 as select i, j from T1 union select i,j from T2;
> create view V2 as select a,b from T3 union select a,b from T4;
> insert into T1 values (1,1), (2,2), (3,3), (4,4), (5,5);
> For a query like select * from V1, V2 where V1.j = V2.b and V1.i in 
> (1,2,3,4,5), it is possible the resultset for V2 is created 5 times. 
> (assuming V2 is choosen as the the inner table) This can be very costly if 
> the underlying selects can take long time and also may perform union many 
> times.
> Enhance materialization logic in setOperatorNode.java. It currently returns 
> FALSE always.
> public boolean performMaterialization(JBitSet outerTables)
>               throws StandardException
> {
>       // RESOLVE - just say no to materialization right now - should be a 
> cost based decision
>       return false;
>       /* Actual materialization, if appropriate, will be placed by our parent 
> PRN.
>        * This is because PRN might have a join condition to apply.  
> (Materialization
>        * can only occur before that.
>        */
>       //return true;
> } 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to