Mike Matrigali wrote:

Any idea if many more queries will now spill to disk? I am worried that
many more will, and now queries will run slower. Seems like there should be a better connection between the optimizer estimate to use
in memory hash and what actually is done in memory. As I understand it
now the Optimizer costs hash only as in memory, if it's estimates say
it won't fit into memory it chooses a different plan. I assume the other plan is more efficient than a hash plan which may have to go to disk for every probe (even if pages are cached going to latched pages
on every probe is a lot more expensive than in memory hash).


My guess is no one did the work to set max_inmemory_rowcnt, because the
backing store stuff wasn't implemented yet.

I have not read the code yet. Is the 1% of all memory, or 1% of free memory?

It is 1% of Runtime.getRuntime().totalMemory().

Changing the costing is difficult. The FromBaseTable.estimateCost method uses the JoinStrategy|.|multiplyBaseCostByOuterRows method to compute the cost of the join as either the cost of constructing the inner table or the number of rows in the outer table times the cost of constructing the inner table. HashJoinStrategy|.|multiplyBaseCostByOuterRows returns false, so the cost of a hash join is estimated as the cost of constructing the hash table, and independent of the number of rows in the outer table. This is a reasonable approximation if the hash table is in memory, the outer table is not enormous, and the hash function does a good job.

If some of the hash table is on disk then neither join cost estimate method is very good. Then cost of a hash join is approximately hashtableBuildCost + outerTableRowCount*hashProbeCost, where hashProbeCost is the expected cost of one probe into the hash table. This will be a function of the fraction of inner table rows on disk, and the number of rows on disk.

The JoinStrategy interface has a method, estimateCost, that looks like it could be made to do the trick, but for some reason FromBaseTable.estimateCost does not use it.

The estimate of the inner table row count is used to estimate the size of the hash table. If it is large we do not use a hash join at all. If we try to include the disk access cost in our hash join cost estimate we have to use the inner table row count estimate to estimate the fraction of rows on disk. But if the inner table passes the maximum size test this fraction is zero if the row count estimate is correct, or very small if we assume some statistical variation. It doesn't make sense to include the disk hash table probe cost in the join cost estimate unless we also ditch the maximum size restriction on hash joins. This might be a good idea, but I would like to fix BackingStoreHashtable so that neither hash joins, nor DISTINCT, nor scroll insensitive cursors blow up before tackling join costing.

I think that the right thing to do is for HashJoinStrategy to pass the correct maximum in-memory row count to BackingStoreHashtable so that the two are following the same playbook with respect to spilling. I will work on this.

Jack

Reply via email to