[ http://issues.apache.org/jira/browse/DERBY-634?page=all ]
Andrew McIntyre resolved DERBY-634.
-----------------------------------
Fix Version/s: 10.2.1.0
(was: 10.3.0.0)
Resolution: Fixed
Merged to 10.2 with revision 432715.
> Subquery materialization can cause stack overflow
> -------------------------------------------------
>
> Key: DERBY-634
> URL: http://issues.apache.org/jira/browse/DERBY-634
> Project: Derby
> Issue Type: Bug
> Components: SQL
> Affects Versions: 10.1.1.1
> Reporter: Jeff Lichtman
> Assigned To: Satheesh Bandaram
> Fix For: 10.2.1.0
>
> Attachments: DERBY-634.diff
>
>
> A performance optimization in subquery processing can cause a stack overflow.
> The optimization materializes a subquery ResultSet in memory where it thinks
> the rows will fit in memory. The materialization is done as a set of nested
> unions of constant rows (UnionResultSets and RowResultSets). If there are a
> lot of rows this can cause a stack overflow when fetching a row.
> The obvious fix is to make it use an iterative technique rather than a
> recursive one for storing and returning the rows. See the method
> BaseActivation.materializeResultSetIfPossible() in the language execution
> code.
> There are some other issues with this performance optimization that should be
> looked at:
> 1) The optimization can backfire, making the query run much slower. For
> example, in the query:
> select * from one_row_table where column1 not in
> (select column2 from million_row_table)
> reading million_row_table into memory is an expensive operation. If there is
> an index on million_row_table.column2, the query should return a result very
> quickly despite the large size of million_row_table by doing a single probe
> into million_row_table via the index.
> Since in-memory materialization can be an expensive operation, the decision
> about whether to do it should be made based on query optimizer cost
> estimates. See SubqueryNode.generateExpression().
> 2) It may not be wise to cache partial query results in memory at all.
> Although this can help performance in some cases, it also chews up memory.
> This is different from a limited-size cache with a backing store (like what
> the store uses for page caching). The language has no way to limit the total
> amount of memory used in this type of processing. Note that hash joins
> originally used in-memory hash tables with no backing store, and that a
> backing store was added later.
> 3) The implementation of this optimization has some problems. The decision to
> materialize the subquery results in memory is made during code generation -
> all such decisions should be made during the compilation phase. It's not
> clear to me why materializeResultSetIfPossible() is in BaseActivation - I
> would expect the of materialization to be done by a type of ResultSet, not by
> a method in BaseActivation. Also, this method calls getMaxMemoryPerTable() in
> the OptimizerFactory - nothing in the execution code should refer to anything
> in the compilation code (perhaps getMaxMemoryPerTable() should be moved
> somewhere else).
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira