okumin commented on code in PR #4511:
URL: https://github.com/apache/hive/pull/4511#discussion_r1271477976


##########
ql/src/test/results/clientpositive/llap/limit_join_transpose.q.out:
##########
@@ -1099,7 +1116,7 @@ limit 1 offset 1
 POSTHOOK: type: QUERY
 POSTHOOK: Input: default@src
 #### A masked pattern was here ####
-86     val_86  86      val_86
+238    val_238 238     val_238

Review Comment:
   This is OK because the test query picks up any single row.
   
   ```
   select *
   from src src1 left outer join src src2
   on src1.key = src2.key
   limit 1 offset 1
   ```



##########
common/src/java/org/apache/hadoop/hive/conf/HiveConf.java:
##########
@@ -7015,6 +7022,12 @@ public static String checkBucketing(Configuration conf) {
       return isAllowed(conf, ConfVars.HIVE_STRICT_CHECKS_BUCKETING) ? null : 
NO_BUCKETING_MSG;
     }
 
+    public static void checkOffsetWithoutOrderBy(Configuration conf) throws 
SemanticException {
+      if (!isAllowed(conf, ConfVars.HIVE_STRICT_CHECKS_OFFSET_NO_ORDERBY)) {
+        throw new SemanticException(NO_OFFSET_WITHOUT_ORDERBY_MSG);

Review Comment:
   This directly throws an Exception following @aturoczy's suggestion. I'm 
thinking to update other usages in another ticket if we agree this manner is 
better.
   https://github.com/apache/hive/pull/4467#discussion_r1256310558



##########
common/src/java/org/apache/hadoop/hive/conf/HiveConf.java:
##########
@@ -1819,6 +1820,10 @@ public static enum ConfVars {
     HIVE_STRICT_CHECKS_BUCKETING("hive.strict.checks.bucketing", true,
         "Enabling strict bucketing checks disallows the following:\n" +
         "  Load into bucketed tables."),
+    
HIVE_STRICT_CHECKS_OFFSET_NO_ORDERBY("hive.strict.checks.offset.no.orderby", 
false,
+        "Enabling strict offset checks disallows the following:\n" +
+        "  OFFSET without ORDER BY.\n" +
+        "OFFSET is mostly meaningless when a result set doesn't have a total 
order."),

Review Comment:
   Let me explain why I want to add this parameter.
   
   I have received a number of inquiries from different users so far. They 
always said like "My `LIMIT 100 OFFSET 10000` returned empty rows even though 
the table has more than 100,000 records".
   Hive actually has a bug as I reported. However, listening to the users 
carefully, 99.9% of them mistakenly tried to use the feature in order to 
paginate their tables as below(Note this is a pseudo-code, and they actually 
use workflow engines or something). This is not deterministic.
   
   ```
   int offset = 0;
   while (true) {
     List<String> rows = submit("SELECT * FROM tbl LIMIT 1000000 OFFSET " + i);
     if (rows.isEmpty) {
       break;
     }
     doSomething(rows);
     i += 1000000;
   }
   ```
   
   I guess OFFSET w/o ORDER BY doesn't have so many valid use cases while users 
can very easily make the mistake. So, I'd like to have an option to reject this 
usage by default.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to