DaveBirdsall commented on a change in pull request #1828: [TRAFODION-3296] Fix 
handling of [first n] + ORDER BY subqueries
URL: https://github.com/apache/trafodion/pull/1828#discussion_r274554946
 
 

 ##########
 File path: core/sql/optimizer/BindRelExpr.cpp
 ##########
 @@ -6807,6 +6846,100 @@ RelExpr *RelRoot::bindNode(BindWA *bindWA)
   return boundExpr;
 } // RelRoot::bindNode()
 
+
+RelExpr * RelRoot::rewriteFirstNOrderBySubquery(BindWA *bindWA)
+{
+  // We have a subquery of the form:
+  //
+  // select [first n] <x> from <t> order by <y>
+  //
+  // Unfortunately, there are chicken-and-egg problems that prevent us
+  // from pushing the ORDER BY down to the firstN node. (We have to wait
+  // until normalizeNode time to insure we get the right ValueIds, but
+  // the RelRoot containing the ORDER BY has already been deleted by
+  // the time we get there.) To circumvent that, we rewrite the subquery
+  // in this form:
+  //
+  // select <x>
+  // from (select <x>, row_number() over(order by <y>) rn from <t>)
+  // where rn <= n
+  //
+  // Aside: For row subqueries in the top-most select list, the 
+  // chicken-and-egg problem doesn't exist because the top-most RelRoot
+  // does survive until normalizeNode time.
+  //
+  // To perform this transformation, we take the input tree (on the left
+  // below) and rewrite it to the output tree (on the right below). Since
+  // this transformation is being done before binding, we move around
+  // ItemExpr trees instead of ValueIds. Binding happens at the end of
+  // the method, after the rewrite.
+  //
+  //   
+  //                                       new RelRoot (newRelRoot), with copy 
of
+  //                                        original select list
+  //                                          | 
+  //                                       new RenameTable (renameTable), 
+  //                                         with the WHERE rn <= n
+  //                                         clause attached
+  //                                          |
+  //     RelRoot of subquery (this)        original RelRoot (this),
+  //       with original select             with ROW_NUMBER ORDER BY aggregate 
added
+  //       list (originalSelectList)         to select list, ORDER BY removed 
+  //        |                                from RelRoot 
+  //        |                                 |
+  //        |                              new RelSequence (sequenceNode)
+  //        |                                 |
+  //     Subquery tree (query)             Subquery tree (query)            
+
+
+  // point to subquery tree
+  RelExpr * query = child(0)->castToRelExpr();
+
+  // retain a pointer to the present select list to put in our new RelRoot
+  ItemExpr * originalSelectList = getCompExprTree();
+
+  // create "row_number() over(order by <y>) as rn" parse tree, removing the
+  // order by tree from this RelRoot in the process
+  Aggregate * rowNumberOverOrderBy = 
+    new (bindWA->wHeap()) Aggregate(ITM_COUNT, 
+                                    new (bindWA->wHeap()) SystemLiteral(1),
+                                    FALSE /*i.e. not distinct*/,
+                                    ITM_COUNT_STAR__ORIGINALLY,
+                                    '!');
+  rowNumberOverOrderBy->setOLAPInfo(NULL, removeOrderByTree(), -INT_MAX, 0);
+  NAString rowNumberColumnName = "_sys_RN_" + bindWA->fabricateUniqueName();
+  ColRefName * colRefName = new (bindWA->wHeap()) 
ColRefName(rowNumberColumnName, bindWA->wHeap());
+  RenameCol * rename = new (bindWA->wHeap())RenameCol(rowNumberOverOrderBy, 
colRefName);
+
+  // add it to select list of the current RelRoot
+  compExprTree_ = new (bindWA->wHeap()) ItemList(compExprTree_,rename);
+
+  // put a RelSequence node on top of the query node, and make it the child of 
this
+  RelSequence * sequenceNode = new (bindWA->wHeap()) RelSequence(query,NULL);
+  sequenceNode->setHasOlapFunctions(TRUE);
+  this->child(0) = sequenceNode;
 
 Review comment:
   Hmmmm... The presence of an ORDER BY implies that all rows must be read and 
sorted. There is no escaping that. Even in firstn plans (where [first n] + 
ORDER BY is specified at the top level query) we do that. There may be an 
optimization possibility of doing Top N sort rather than a full sort. But in 
any case, all rows must be read.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to