GWphua commented on code in PR #17192:
URL: https://github.com/apache/druid/pull/17192#discussion_r1784180872


##########
sql/src/test/resources/drill/window/queries/nestedAggs/woutPrtnBy_6.e:
##########
@@ -3,4 +3,4 @@ b       a
 c      a
 d      a
 e      a
-null   a
+null   null

Review Comment:
   I would like to justify my changes to the expected test result for test 
cases `woutPrtnBy_6` and `woutPrtnBy_7`.
   
   Taking `woutPrtnBy_6` for example, the query is as follows:
   
   ```
   SELECT c2, MIN(MIN(c2)) OVER( ORDER BY c2 ) min_c2 
   FROM "tblWnulls.parquet" 
   GROUP BY c2
   ```
   
   the original expected and actual results are as follows:
   
   ```
   2024-10-02T08:47:04,748 INFO [drillWindowQuery-nestedAggs/woutPrtnBy_6] 
org.apache.druid.sql.calcite.BaseCalciteQueryTest - -- Expected results --
   new Object[]{"a", "a"},
   new Object[]{"b", "a"},
   new Object[]{"c", "a"},
   new Object[]{"d", "a"},
   new Object[]{"e", "a"},
   new Object[]{null, "a"}
   ----
   2024-10-02T08:47:04,748 INFO [drillWindowQuery-nestedAggs/woutPrtnBy_6] 
org.apache.druid.sql.calcite.BaseCalciteQueryTest - -- Actual results --
   new Object[]{"a", "a"},
   new Object[]{"b", "a"},
   new Object[]{"c", "a"},
   new Object[]{"d", "a"},
   new Object[]{"e", "a"},
   new Object[]{null, null}
   ----
   ```
   
   Turns out that either one of `OVER(ORDER BY c2)` or `GROUP BY` will a 
partial-result of the following order: `[null, a, b, c, d, e]`. As such, our 
`MinStringAggregator#compare` will be iterating strings as follows:
   
   ```
   compare(null, null) --> null
   compare(null, 'a') --> 'a'
   compare('a', 'b') --> 'a'
   compare('a', 'c') --> 'a'
   compare('a', 'd') --> 'a'
   compare('a', 'e') --> 'a'
   ```
   
   As we can see, due to the order in which `compare()` runs, we are not able 
to do away with `null`.  This order of things also happened for `woutPrtnBy_7`, 
which uses MAX instead of MIN. 
   
   While I continued to work and experimented with things, I realised that this 
order is not only limited to my new StringMin implementation. A tweak to 
`woutPrtnBy_2` shows the following results: 
   
   Original Query:
   ```
   SELECT c2, MIN(MAX(c1)) OVER( ORDER BY c2 ) min_mx_c1 FROM 
"tblWnulls.parquet" GROUP BY c2
   ```
   
   Changed Query:
   ```
   SELECT c1, MIN(MAX(c1)) OVER( ORDER BY c1 ) min_mx_c1 FROM 
"tblWnulls.parquet" GROUP BY c1
   ```
   
   Results:
   ```
   2024-10-02T08:43:55,868 INFO [drillWindowQuery-nestedAggs/woutPrtnBy_2] 
org.apache.druid.sql.calcite.BaseCalciteQueryTest - row #0: [null, null]
   2024-10-02T08:43:55,868 INFO [drillWindowQuery-nestedAggs/woutPrtnBy_2] 
org.apache.druid.sql.calcite.BaseCalciteQueryTest - row #1: [-1, -1]
   2024-10-02T08:43:55,868 INFO [drillWindowQuery-nestedAggs/woutPrtnBy_2] 
org.apache.druid.sql.calcite.BaseCalciteQueryTest - row #2: [0, -1]
   2024-10-02T08:43:55,868 INFO [drillWindowQuery-nestedAggs/woutPrtnBy_2] 
org.apache.druid.sql.calcite.BaseCalciteQueryTest - row #3: [1, -1]
   2024-10-02T08:43:55,868 INFO [drillWindowQuery-nestedAggs/woutPrtnBy_2] 
org.apache.druid.sql.calcite.BaseCalciteQueryTest - row #4: [2, -1]
   2024-10-02T08:43:55,868 INFO [drillWindowQuery-nestedAggs/woutPrtnBy_2] 
org.apache.druid.sql.calcite.BaseCalciteQueryTest - row #5: [4, -1]
   2024-10-02T08:43:55,868 INFO [drillWindowQuery-nestedAggs/woutPrtnBy_2] 
org.apache.druid.sql.calcite.BaseCalciteQueryTest - row #6: [5, -1]
   2024-10-02T08:43:55,868 INFO [drillWindowQuery-nestedAggs/woutPrtnBy_2] 
org.apache.druid.sql.calcite.BaseCalciteQueryTest - row #7: [6, -1]
   2024-10-02T08:43:55,868 INFO [drillWindowQuery-nestedAggs/woutPrtnBy_2] 
org.apache.druid.sql.calcite.BaseCalciteQueryTest - row #8: [8, -1]
   2024-10-02T08:43:55,868 INFO [drillWindowQuery-nestedAggs/woutPrtnBy_2] 
org.apache.druid.sql.calcite.BaseCalciteQueryTest - row #9: [9, -1]
   2024-10-02T08:43:55,868 INFO [drillWindowQuery-nestedAggs/woutPrtnBy_2] 
org.apache.druid.sql.calcite.BaseCalciteQueryTest - row #10: [10, -1]
   2024-10-02T08:43:55,868 INFO [drillWindowQuery-nestedAggs/woutPrtnBy_2] 
org.apache.druid.sql.calcite.BaseCalciteQueryTest - row #11: [11, -1]
   2024-10-02T08:43:55,868 INFO [drillWindowQuery-nestedAggs/woutPrtnBy_2] 
org.apache.druid.sql.calcite.BaseCalciteQueryTest - row #12: [12, -1]
   2024-10-02T08:43:55,868 INFO [drillWindowQuery-nestedAggs/woutPrtnBy_2] 
org.apache.druid.sql.calcite.BaseCalciteQueryTest - row #13: [13, -1]
   2024-10-02T08:43:55,868 INFO [drillWindowQuery-nestedAggs/woutPrtnBy_2] 
org.apache.druid.sql.calcite.BaseCalciteQueryTest - row #14: [14, -1]
   2024-10-02T08:43:55,868 INFO [drillWindowQuery-nestedAggs/woutPrtnBy_2] 
org.apache.druid.sql.calcite.BaseCalciteQueryTest - row #15: [15, -1]
   2024-10-02T08:43:55,868 INFO [drillWindowQuery-nestedAggs/woutPrtnBy_2] 
org.apache.druid.sql.calcite.BaseCalciteQueryTest - row #16: [17, -1]
   2024-10-02T08:43:55,868 INFO [drillWindowQuery-nestedAggs/woutPrtnBy_2] 
org.apache.druid.sql.calcite.BaseCalciteQueryTest - row #17: [19, -1]
   2024-10-02T08:43:55,869 INFO [drillWindowQuery-nestedAggs/woutPrtnBy_2] 
org.apache.druid.sql.calcite.BaseCalciteQueryTest - row #18: [11111, -1]
   2024-10-02T08:43:55,869 INFO [drillWindowQuery-nestedAggs/woutPrtnBy_2] 
org.apache.druid.sql.calcite.BaseCalciteQueryTest - row #19: [65536, -1]
   2024-10-02T08:43:55,869 INFO [drillWindowQuery-nestedAggs/woutPrtnBy_2] 
org.apache.druid.sql.calcite.BaseCalciteQueryTest - row #20: [1000000, -1]
   2024-10-02T08:43:55,869 INFO [drillWindowQuery-nestedAggs/woutPrtnBy_2] 
org.apache.druid.sql.calcite.BaseCalciteQueryTest - row #21: [2147483647, -1]
   
   java.lang.RuntimeException: Can't parse input!
   ```
   
   Although the test ran into a runtime exception, we can see that the results 
that are logged shows the presence of [null, null]. Hence, this means that the 
partial query generated in this case also does not follow a null last 
comparator logic.
   
   Seeing that the behaviour is the same across different ColumnTypes, I think 
that fixing this bug (if do we see it as one) will require another PR.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to