comnetwork commented on a change in pull request #444: PHOENIX-5148
URL: https://github.com/apache/phoenix/pull/444#discussion_r265831803
##########
File path:
phoenix-core/src/it/java/org/apache/phoenix/end2end/SortMergeJoinMoreIT.java
##########
@@ -423,7 +424,7 @@ public void testBug2894() throws Exception {
" ) L\n" +
" ON L.BUCKET = E.BUCKET AND L.TIMESTAMP =
E.TIMESTAMP\n" +
" ) C\n" +
- " GROUP BY C.BUCKET, C.TIMESTAMP";
+ " GROUP BY C.BUCKET, C.TIMESTAMP ORDER BY C.BUCKET,
C.TIMESTAMP";
Review comment:
I added the order by here mainly for:
1.Make the query result stable,because the original sql is only have group
by clause, and the group by is not `orderPreserving` before this patch, so must
add an additional `SORTED BY [E.TIMESTAMP, E.BUCKET]`, but after this patch,
the group by is `orderPreserving`, so the query result is not same as without
this patch.
2.To test if the GroupBy and OrderBy could both be optimized by this patch,
from the explain we can see the
`GROUP BY C.BUCKET, C.TIMESTAMP` is `orderPreserving` and
`ORDER BY C.BUCKET, C.TIMESTAMP` is compiled out, which are meeting the
expectation.
If I do not add the order by, the explain assertions and query result
assertions are same as the order by is added.
Another problem I want to explain is why the order of query result is
reversed after this patch:
the query result before this patch is :
assertEquals("5SEC", rs.getString(1));
assertEquals(1462993520000000000L, rs.getLong(2));
assertTrue(rs.next());
assertEquals("5SEC", rs.getString(1));
assertEquals(1462993515000000000L, rs.getLong(2));
assertTrue(rs.next());
assertEquals("5SEC", rs.getString(1));
assertEquals(1462993510000000000L, rs.getLong(2));
assertTrue(rs.next());
assertEquals("5SEC", rs.getString(1));
assertEquals(1462993505000000000L, rs.getLong(2));
assertTrue(rs.next());
assertEquals("5SEC", rs.getString(1));
assertEquals(1462993490000000000L, rs.getLong(2));
assertTrue(rs.next());
assertEquals("5SEC", rs.getString(1));
assertEquals(1462993485000000000L, rs.getLong(2));
assertTrue(rs.next());
assertEquals("5SEC", rs.getString(1));
assertEquals(1462993480000000000L, rs.getLong(2));
assertTrue(rs.next());
assertEquals("5SEC", rs.getString(1));
assertEquals(1462993475000000000L, rs.getLong(2));
assertTrue(rs.next());
assertEquals("5SEC", rs.getString(1));
assertEquals(1462993470000000000L, rs.getLong(2));
assertTrue(rs.next());
assertEquals("5SEC", rs.getString(1));
assertEquals(1462993430000000000L, rs.getLong(2));
the query result after this patch is :
assertEquals("5SEC", rs.getString(1));
assertEquals(1462993430000000000L, rs.getLong(2));
assertTrue(rs.next());
assertEquals("5SEC", rs.getString(1));
assertEquals(1462993470000000000L, rs.getLong(2));
assertTrue(rs.next());
assertEquals("5SEC", rs.getString(1));
assertEquals(1462993475000000000L, rs.getLong(2));
assertTrue(rs.next());
assertEquals("5SEC", rs.getString(1));
assertEquals(1462993480000000000L, rs.getLong(2));
assertTrue(rs.next());
assertEquals("5SEC", rs.getString(1));
assertEquals(1462993485000000000L, rs.getLong(2));
assertTrue(rs.next());
assertEquals("5SEC", rs.getString(1));
assertEquals(1462993490000000000L, rs.getLong(2));
assertTrue(rs.next());
assertEquals("5SEC", rs.getString(1));
assertEquals(1462993505000000000L, rs.getLong(2));
assertTrue(rs.next());
assertEquals("5SEC", rs.getString(1));
assertEquals(1462993510000000000L, rs.getLong(2));
assertTrue(rs.next());
assertEquals("5SEC", rs.getString(1));
assertEquals(1462993515000000000L, rs.getLong(2));
assertTrue(rs.next());
assertEquals("5SEC", rs.getString(1));
assertEquals(1462993520000000000L, rs.getLong(2));
We can see that the query result order is reversed after the patch, that is
because without this patch , `CLIENT SORTED BY [E.TIMESTAMP, E.BUCKET]`
(`E.TIMESTAMP` is moved first because is fixed width and not null) is added in
order to complete the aggregate, and we have to note is that the `SortOrder`
of `E.TIMESTAMP` is `DESC` in table `eventCountTableName`'s DDL, meanwhile
when we add `CLIENT SORTED BY [E.TIMESTAMP, E.BUCKET]` in
`ClientAggregatePlan.iterator`, the `isAscending` parameter in following line
159 is true, so after` OrderedResultIterator` in line 167, the result tuple is
sort by the "DESC of E.TIMESTAMP", that is to say , `order by E.TIMESTAMP
DESC,E.BUCKET ASC`
With this patch, the unnecessary sort for completing GroupBy is avoid,
after the SortMergeJoin, the ordering is `order by E.BUCKET ASC, E.TIMESTAMP
ASC`, so the order of query result is reversed after this patch when the
`E.BUCKET` is same.
```
157 List<OrderByExpression> keyExpressionOrderBy =
Lists.newArrayListWithExpectedSize(keyExpressions.size());
158 for (Expression keyExpression : keyExpressions) {
159 keyExpressionOrderBy.add(new
OrderByExpression(keyExpression, false, true));
160 }
161
162 if (useHashAgg) {
163 // Pass in orderBy to apply any sort that has been
optimized away
164 aggResultIterator = new
ClientHashAggregatingResultIterator(context, iterator, serverAggregators,
keyExpressions, orderBy);
165 } else {
166 iterator =
167 new OrderedResultIterator(iterator,
keyExpressionOrderBy,
168 spoolingEnabled, thresholdBytes,
null, null,
169 projector.getEstimatedRowByteSize());
170 aggResultIterator = new
ClientGroupedAggregatingResultIterator(LookAheadResultIterator.wrap(iterator),
serverAggregators, keyExpressions);
171 }
172 }
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services