[ 
https://issues.apache.org/jira/browse/HIVE-24545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17250238#comment-17250238
 ] 

László Bodor edited comment on HIVE-24545 at 12/16/20, 10:40 AM:
-----------------------------------------------------------------

this part was touched in HIVE-23117, but the problem still persist I think, 
seems like we downcast to int, even when the thrift response is about to return 
long:
{code}
  @Override
  public int getUpdateCount() throws SQLException {
    checkConnection("getUpdateCount");
    /**
     * Poll on the operation status, till the operation is complete. We want to 
ensure that since a
     * client might end up using executeAsync and then call this to check if 
the query run is
     * finished.
     */
    long numModifiedRows = -1L;
    TGetOperationStatusResp resp = waitForOperationToComplete();
    if (resp != null) {
      numModifiedRows = resp.getNumModifiedRows();
    }
    if (numModifiedRows == -1L || numModifiedRows > Integer.MAX_VALUE) {
      LOG.warn("Invalid number of updated rows: {}", numModifiedRows);
      return -1;
    }
    return (int) numModifiedRows;
  }
{code}

seems like java.sql.Statement forces us to implement:
{code}
    int getUpdateCount() throws SQLException;
{code}

I'm wondering if we can switch to 
https://docs.oracle.com/en/java/javase/11/docs/api/java.sql/java/sql/Statement.html#getLargeUpdateCount()


was (Author: abstractdog):
this part was touched in HIVE-23117, but the problem still persist I think, 
seems like we downcast to int, even when the thrift response is about to return 
long:
{code}
  @Override
  public int getUpdateCount() throws SQLException {
    checkConnection("getUpdateCount");
    /**
     * Poll on the operation status, till the operation is complete. We want to 
ensure that since a
     * client might end up using executeAsync and then call this to check if 
the query run is
     * finished.
     */
    long numModifiedRows = -1L;
    TGetOperationStatusResp resp = waitForOperationToComplete();
    if (resp != null) {
      numModifiedRows = resp.getNumModifiedRows();
    }
    if (numModifiedRows == -1L || numModifiedRows > Integer.MAX_VALUE) {
      LOG.warn("Invalid number of updated rows: {}", numModifiedRows);
      return -1;
    }
    return (int) numModifiedRows;
  }
{code}

seems like java.sql.Statement forces us to implement:
{code}
    int getUpdateCount() throws SQLException;
{code}


> jdbc.HiveStatement: Number of rows is greater than Integer.MAX_VALUE
> --------------------------------------------------------------------
>
>                 Key: HIVE-24545
>                 URL: https://issues.apache.org/jira/browse/HIVE-24545
>             Project: Hive
>          Issue Type: Bug
>            Reporter: László Bodor
>            Assignee: László Bodor
>            Priority: Minor
>
> I found this while IOW on TPCDS 10TB:
> {code}
> ----------------------------------------------------------------------------------------------
>         VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  
> FAILED  KILLED
> ----------------------------------------------------------------------------------------------
> Map 1 ..........      llap     SUCCEEDED   4210       4210        0        0  
>      0     362
> Reducer 2 ......      llap     SUCCEEDED    101        101        0        0  
>      0       2
> Reducer 3 ......      llap     SUCCEEDED   1009       1009        0        0  
>      0       1
> ----------------------------------------------------------------------------------------------
> VERTICES: 03/03  [==========================>>] 100%  ELAPSED TIME: 12613.62 s
> ----------------------------------------------------------------------------------------------
> 20/12/16 01:37:36 [main]: WARN jdbc.HiveStatement: Number of rows is greater 
> than Integer.MAX_VALUE
> {code}
> my scenario was:
> {code}
> set hive.exec.max.dynamic.partitions=2000;
> drop table if exists test_sales_2;
> create table test_sales_2 like 
> tpcds_bin_partitioned_acid_orc_10000.store_sales;
> insert overwrite table test_sales_2 select * from 
> tpcds_bin_partitioned_acid_orc_10000.store_sales where ss_sold_date_sk > 
> 2451868;
> {code}
> regarding affected row numbers:
> {code}
> select count(*) from tpcds_bin_partitioned_acid_orc_10000.store_sales where 
> ss_sold_date_sk > 2451868;
> +--------------+
> |     _c0      |
> +--------------+
> | 12287871907  |
> +--------------+
> {code}
> I guess we should switch to long



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to