Subquery materialization can cause stack overflow
-------------------------------------------------
Key: DERBY-634
URL: http://issues.apache.org/jira/browse/DERBY-634
Project: Derby
Type: Bug
Components: SQL
Versions: 10.1.1.1
Reporter: Jeff Lichtman
A performance optimization in subquery processing can cause a stack overflow.
The optimization materializes a subquery ResultSet in memory where it thinks
the rows will fit in memory. The materialization is done as a set of nested
unions of constant rows (UnionResultSets and RowResultSets). If there are a lot
of rows this can cause a stack overflow when fetching a row.
The obvious fix is to make it use an iterative technique rather than a
recursive one for storing and returning the rows. See the method
BaseActivation.materializeResultSetIfPossible() in the language execution code.
There are some other issues with this performance optimization that should be
looked at:
1) The optimization can backfire, making the query run much slower. For
example, in the query:
select * from one_row_table where column1 not in
(select column2 from million_row_table)
reading million_row_table into memory is an expensive operation. If there is an
index on million_row_table.column2, the query should return a result very
quickly despite the large size of million_row_table by doing a single probe
into million_row_table via the index.
Since in-memory materialization can be an expensive operation, the decision
about whether to do it should be made based on query optimizer cost estimates.
See SubqueryNode.generateExpression().
2) It may not be wise to cache partial query results in memory at all. Although
this can help performance in some cases, it also chews up memory. This is
different from a limited-size cache with a backing store (like what the store
uses for page caching). The language has no way to limit the total amount of
memory used in this type of processing. Note that hash joins originally used
in-memory hash tables with no backing store, and that a backing store was added
later.
3) The implementation of this optimization has some problems. The decision to
materialize the subquery results in memory is made during code generation - all
such decisions should be made during the compilation phase. It's not clear to
me why materializeResultSetIfPossible() is in BaseActivation - I would expect
the of materialization to be done by a type of ResultSet, not by a method in
BaseActivation. Also, this method calls getMaxMemoryPerTable() in the
OptimizerFactory - nothing in the execution code should refer to anything in
the compilation code (perhaps getMaxMemoryPerTable() should be moved somewhere
else).
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira