[ 
http://issues.apache.org/jira/browse/DERBY-781?page=comments#action_12361709 ] 

Jeff Lichtman commented on DERBY-781:
-------------------------------------

I've been thinking about this enhancement request ever since it was reported. 
Something didn't seem quite right to me, but it wasn't until now that I was 
able to put my finger on it.

The report identifies a real problem (performance with a union as the inner 
table of a join) and proposes a solution that would work (materialization). I 
think, though, that the proposed solution focuses in the wrong place. The 
materialization should happen as a result of a join strategy, not as part of 
the logic associated with unions.

There are cases where materializing a union would cause a query to run slower. 
Materialization requires the creation of a temporary conglomerate and the 
inserting of rows into the conglomerate, so it should be done only if the 
savings are greater than the costs. Since materialization can make things 
either faster or slower depending on circumstances, the decision as to whether 
to materialize should be done in the optimizer.

Also, there are other types of  result sets that could benefit from 
materialization - for example, INTERSECT, joins, aggregates, etc. Any of these 
could end up on the right side of a join through the use of  table subqueries 
(i.e. SELECT statements in the FROM list of the outer query). I don't think we 
want to re-implement the materialization logic in all of these cases. I suppose 
the logic could be pushed into a parent class, but I think even that would be 
putting it in the wrong place.

If you think about it, we already have a join strategy that materializes the 
inner result  set, i.e. hash join.  I would expect the optimizer to at least 
consider this strategy for the example given in this enhancement request. We 
should check whether the optimizer is making the correct decision about hash 
join in this case before implementing materialization logic specific to unions.


> Materialize union subqueries in select list where possible to avoid creating 
> invariant resultsets many times.
> -------------------------------------------------------------------------------------------------------------
>
>          Key: DERBY-781
>          URL: http://issues.apache.org/jira/browse/DERBY-781
>      Project: Derby
>         Type: Improvement
>   Components: SQL
>     Versions: 10.1.1.0, 10.2.0.0
>  Environment: generic
>     Reporter: Satheesh Bandaram

>
> Derby's handling of union subqueries in from list can be improved by 
> materializing invariant resultsets once, rather than creating them many times.
> For example:
> create view V1 as select i, j from T1 union select i,j from T2;
> create view V2 as select a,b from T3 union select a,b from T4;
> insert into T1 values (1,1), (2,2), (3,3), (4,4), (5,5);
> For a query like select * from V1, V2 where V1.j = V2.b and V1.i in 
> (1,2,3,4,5), it is possible the resultset for V2 is created 5 times. 
> (assuming V2 is choosen as the the inner table) This can be very costly if 
> the underlying selects can take long time and also may perform union many 
> times.
> Enhance materialization logic in setOperatorNode.java. It currently returns 
> FALSE always.
> public boolean performMaterialization(JBitSet outerTables)
>               throws StandardException
> {
>       // RESOLVE - just say no to materialization right now - should be a 
> cost based decision
>       return false;
>       /* Actual materialization, if appropriate, will be placed by our parent 
> PRN.
>        * This is because PRN might have a join condition to apply.  
> (Materialization
>        * can only occur before that.
>        */
>       //return true;
> } 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply via email to