[jira] [Commented] (TRAFODION-2455) Initial Update Stats on 22B row 2.5TB OE table gets 0 rowcount from estimator, fails with timeouts by doing select count (*)

ASF GitHub Bot (JIRA) Thu, 26 Jan 2017 12:59:44 -0800

    [ 
https://issues.apache.org/jira/browse/TRAFODION-2455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15840444#comment-15840444
 ]


ASF GitHub Bot commented on TRAFODION-2455:
-------------------------------------------

Github user DaveBirdsall commented on a diff in the pull request:

    https://github.com/apache/incubator-trafodion/pull/929#discussion_r98089783
  
    --- Diff: core/sql/executor/HBaseClient_JNI.cpp ---
    @@ -1303,12 +1304,26 @@ HBC_RetCode HBaseClient_JNI::grant(const Text& 
user, const Text& tblName, const
     HBC_RetCode HBaseClient_JNI::estimateRowCount(const char* tblName,
                                                   Int32 partialRowSize,
                                                   Int32 numCols,
    -                                              Int64& rowCount)
    +                                              Int32 retryLimitMilliSeconds,
    +                                              Int64& rowCount,
    +                                              Int32& breadCrumb)
     {
    +  // Note: Please use HBC_ERROR_ROWCOUNT_EST_EXCEPTION only for
    +  // those error returns that call getExceptionDetails(). This
    +  // tells the caller that Java exception information is available.
    +
       QRLogger::log(CAT_SQL_HBASE, LL_DEBUG, 
"HBaseClient_JNI::estimateRowCount(%s) called.", tblName);
    -  if (initJNIEnv() != JOI_OK)
    -     return HBC_ERROR_INIT_PARAM;
    +  breadCrumb = 1;
    +  if (jenv_ == NULL)
    +     if (initJVM() != JOI_OK)
    +         return HBC_ERROR_INIT_PARAM;
     
    +  breadCrumb = 2;
    +  if (jenv_->PushLocalFrame(jniHandleCapacity_) != 0) {
    +     getExceptionDetails();
    +     return HBC_ERROR_ROWCOUNT_EST_EXCEPTION;
    +  }
    +  breadCrumb = 3;
    --- End diff --
    
    My mistake... I seem to have accidentally resurrected some old code. Will 
fix.


> Initial Update Stats on 22B row 2.5TB OE table gets 0 rowcount from 
> estimator, fails with timeouts by doing select count (*)
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: TRAFODION-2455
>                 URL: https://issues.apache.org/jira/browse/TRAFODION-2455
>             Project: Apache Trafodion
>          Issue Type: Bug
>          Components: sql-cmp
>    Affects Versions: 2.1-incubating
>         Environment: A cluster large enough to host a 22 billion row table
>            Reporter: David Wayne Birdsall
>            Assignee: David Wayne Birdsall
>
> When loading a scale factor 73728 Order Entry database, if UPDATE STATISTICS 
> is done soon after the load on one particular table (the largest table, 
> having 22 billion rows), we get the following failure:
> SQLEXCEPTION on Statement, Error Code = -9200
>    update statistics for table trafodion.javabench.oe_orderline_73728 on 
> every column, (OL_W_ID, OL_I_ID), (OL_D_ID, OL_W_ID), (OL_D_ID, OL_I_ID) 
> sample
> *** ERROR[9200] UPDATE STATISTICS for table 
> TRAFODION.JAVABENCH.OE_ORDERLINE_73728 encountered an error (8448) from 
> statement getRow(). [2017-01-09 02:07:22]
> *** ERROR[8448] Unable to access Hbase interface. Call to 
> ExpHbaseInterface::coProcAggr returned error HBASE_ACCESS_ERROR(-706). Cause: 
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after 
> attempts=3, exceptions:
> Mon Jan 09 01:47:21 PST 2017, 
> RpcRetryingCaller{globalStartTime=1483954641419, pause=100, retries=3}, 
> java.io.IOException: Call to nap015.esgyn.local/10.1.10.20:60020 failed on 
> local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call 
> id=73, waitTime=600001, operationTimeout=600000 expired.
> Mon Jan 09 01:57:21 PST 2017, 
> RpcRetryingCaller{globalStartTime=1483954641419, pause=100, retries=3}, 
> java.io.IOException: Call to nap015.esgyn.local/10.1.10.20:60020 failed on 
> local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call 
> id=185, waitTime=600001, operationTimeout=600000 expired.
> Mon Jan 09 02:07:22 PST 2017, 
> RpcRetryingCaller{globalStartTime=1483954641419, pause=100, retries=3}, 
> java.io.IOException: Call to nap015.esgyn.local/10.1.10.20:60020 failed on 
> local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call 
> id=310, waitTime=600001, operationTimeout=600000 expired.
> A subsequent update statistics command succeeds, but these failures take a 
> half hour or more.
> Enabling logging for update stats shows that getrowcount returns 0, so update 
> stats assumes the table is small enough to do a select count (*). The plan 
> for this select count (*) (perhaps suffering from the same issue that causes 
> getrowcount to return a non-estimate) chooses the HBase aggregate 
> coprocessor. The table in question has 22 billion rows, so the the 
> coprocessor isn't a good choice, and the query times out. But the real issue 
> is, why can't the table get a rowcount estimate.
> Rerunning UPDATE STATS on this table a few hours later succeeds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TRAFODION-2455) Initial Update Stats on 22B row 2.5TB OE table gets 0 rowcount from estimator, fails with timeouts by doing select count (*)

Reply via email to