Re: Optimizer question

Thomas Nielsen Fri, 07 Dec 2007 16:14:03 -0800

Bryan Pendleton wrote:

[snip]


With the new restriction approach that you are proposing for DERBY-2998,
am I correct that you are suggesting we might invent a new ResultSet
node implementation, which computes the ROW_NUMBER() value for its rows?

Almost.

In my patch I'm suggesting to introduce a new ResultSet, calledOLAPResultSet, that is capable of evaluating restrictions and passingrows to the above ResultSet (in getNextRowCore()). This OLAPResultSet isbased on how index result sets are handled, and will look at each rowand evaluate it against the supplied restriction GeneratedMethod codesnippet.As a comparison BulkTableScanResultSet for instance, will pass a chunkof rows up not caring about restrictions at all.

My intention was that OLAPResultSet could be expanded in the future withother OLAP specifics that need to live "somewhere" - for examplehandling windowing and/or other ranking functions.

The actual row number value is (as of today) computed using a generatedmethod call into BaseActivation (similar to autoincrement). This may change.

If I am understanding that correctly, are you envisioning that a WHERE
clause which refers to the ROW_NUMBER() result will be implemented by
generating comparison code into the "restriction" method of the generated
activation code?

Correct.

The restricton starts its life as a whereClause in the SelectNode. Thengoes on to become a Predicate passed up and down the querytree duringoptimization.It ends up as a GeneratedMethod restriction snippet after bytecodegeneration. This GeneratedMethod is run/executed by the ResultSet (orrather it's Activation, if I understand this correctly) for any givenrow or set of rows depending on the specific ResultSet implementation.

It seems to me that the (potential) optimization win by performing the
ROW_NUMBER() restriction at the right level of the execution tree is
quite high (in fact, it's the whole point of DERBY-581), so it makes
sense to me that it would be worth the optimizer's effort to consider
alternate locations for it, and thus you seem to be heading down the
right path with your research.

Correct.

A WHERE restriction on a ROW_NUMBER() column should be pushed down tothe subquery containing the function. The ROW_NUMBER() function is knownto be ever increasing, and we can utilize that when limiting the numberof rows we pass up the chain (like we can with an index...).

Worst case scenario is what my patch attached to DERBY-2998 is as oftoday - no optimization cause a full table scan in the inner select -lets say 1 million rows - and the restriction is evaluated in the outerselect - lets say limiting to 5 rows. So in this specific case we read999 995 rows unneccessary.The optimized version would have pushed the restriction into the innerselect, and stopped pushing rows out to the outer select after the 5throw of the 1 million. Which would be a lot faster, obviously. The largerthe table the larger the penalty.


I guess I'm mostly just trying to educate myself here, but hopefully
some of these ideas and questions will help you clarify your studies and
help us all understand how this processing works.

I think we are all learing from this - at least I am :)

Cheers,
Thomas

Re: Optimizer question

Reply via email to