[jira] [Work logged] (HIVE-27133) Round off limit value greater than int_max to int_max;

ASF GitHub Bot (Jira) Wed, 15 Mar 2023 06:27:07 -0700


     [ 
https://issues.apache.org/jira/browse/HIVE-27133?focusedWorklogId=851145&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-851145
 ]


ASF GitHub Bot logged work on HIVE-27133:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 15/Mar/23 13:26
            Start Date: 15/Mar/23 13:26
    Worklog Time Spent: 10m 
      Work Description: SourabhBadhya commented on code in PR #4110:
URL: https://github.com/apache/hive/pull/4110#discussion_r1137062590


##########
ql/src/test/queries/clientpositive/limit_max_int.q:
##########
@@ -0,0 +1,6 @@
+--! qt:dataset:src
+select key from src limit 214748364700;
+select key from src where key = '238' limit 214748364700;
+select * from src where key = '238' limit 214748364700;
+select src.key, count(src.value) from src group by src.key limit 214748364700;
+select * from ( select key from src limit 3) sq1 limit 214748364700;

Review Comment:
   nit: Please add a newline at the end of the qfile.



##########
common/src/java/org/apache/hive/common/util/HiveStringUtils.java:
##########
@@ -1174,4 +1175,25 @@ private static boolean isComment(String line) {
     return lineTrimmed.startsWith("#") || lineTrimmed.startsWith("--");
   }
 
+  /**
+   * Returns integer value of a string. If the string value exceeds max int, 
returns Integer.MAX_VALUE
+   * else if the string value is less than min int, returns Integer.MIN_VALUE
+   *
+   * @param value value of the input string
+   * @return integer
+   */
+  public static int convertStringToBoundedInt(String value) {
+    try {
+      BigInteger bigIntValue = new BigInteger(value);
+      if (bigIntValue.compareTo(BigInteger.valueOf(Integer.MAX_VALUE)) > 0) {
+        return Integer.MAX_VALUE;

Review Comment:
   @vamshikolanu 
   I agree with @jfsii. Converting a large number to Integer.MAX_VALUE is 
misleading the user. 
   Consider the following query - 
   `INSERT INTO TABLE destinationTable SELECT * FROM sourceTable LIMIT 
<some_large_number>;`
   The insert will write records based on the output of the SELECT operator. In 
this case, since we have converted it to Integer.MAX_VALUE, the number of 
records written will be equal to Integer.MAX_VALUE which might not be what the 
user wants.
   
   Perhaps adding a meaningful exception is better. In the long term, adding 
support for large integers for LIMIT clauses is even more better.





Issue Time Tracking
-------------------

    Worklog Id:     (was: 851145)
    Time Spent: 1h  (was: 50m)

> Round off limit value greater than int_max to int_max;
> ------------------------------------------------------
>
>                 Key: HIVE-27133
>                 URL: https://issues.apache.org/jira/browse/HIVE-27133
>             Project: Hive
>          Issue Type: Task
>            Reporter: vamshi kolanu
>            Assignee: vamshi kolanu
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> Currently when the limit has a bigint value, it fails with the following 
> error. As part of this task, we are going to round off any value greater than 
> int_max to int_max.
> select string_col from alltypes order by 1 limit 9223372036854775807
>  
> java.lang.NumberFormatException: For input string: "9223372036854775807"      
>   at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>         at java.lang.Integer.parseInt(Integer.java:583)
>         at java.lang.Integer.<init>(Integer.java:867)
>         at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.doPhase1(SemanticAnalyzer.java:1803)
>         at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.doPhase1(SemanticAnalyzer.java:1911)
>         at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.doPhase1(SemanticAnalyzer.java:1911)
>         at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:12616)
>         at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12718)
>         at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:450)
>         at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:299)
>         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:650)
>         at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1503)
>         at 
> org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1450)
>         at 
> org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1445)
>         at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:126)
>         at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:200)
>         at 
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:265)
>         at 
> org.apache.hive.service.cli.operation.Operation.run(Operation.java:274)
>         at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:565)
>         at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:551)
>         at 
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:315)
>         at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:567)
>         at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1557)
>         at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1542)
>         at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
>         at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
>         at 
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
>         at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (HIVE-27133) Round off limit value greater than int_max to int_max;

Reply via email to