[jira] [Commented] (TRAFODION-2455) Initial Update Stats on 22B row 2.5TB OE table gets 0 rowcount from estimator, fails with timeouts by doing select count (*)

ASF GitHub Bot (JIRA) Wed, 25 Jan 2017 17:55:31 -0800

    [ 
https://issues.apache.org/jira/browse/TRAFODION-2455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15839018#comment-15839018
 ]


ASF GitHub Bot commented on TRAFODION-2455:
-------------------------------------------

Github user selvaganesang commented on a diff in the pull request:

    https://github.com/apache/incubator-trafodion/pull/929#discussion_r97918864
  
    --- Diff: core/sql/executor/HBaseClient_JNI.cpp ---
    @@ -1303,12 +1304,26 @@ HBC_RetCode HBaseClient_JNI::grant(const Text& 
user, const Text& tblName, const
     HBC_RetCode HBaseClient_JNI::estimateRowCount(const char* tblName,
                                                   Int32 partialRowSize,
                                                   Int32 numCols,
    -                                              Int64& rowCount)
    +                                              Int32 retryLimitMilliSeconds,
    +                                              Int64& rowCount,
    +                                              Int32& breadCrumb)
     {
    +  // Note: Please use HBC_ERROR_ROWCOUNT_EST_EXCEPTION only for
    +  // those error returns that call getExceptionDetails(). This
    +  // tells the caller that Java exception information is available.
    +
       QRLogger::log(CAT_SQL_HBASE, LL_DEBUG, 
"HBaseClient_JNI::estimateRowCount(%s) called.", tblName);
    -  if (initJNIEnv() != JOI_OK)
    -     return HBC_ERROR_INIT_PARAM;
    +  breadCrumb = 1;
    +  if (jenv_ == NULL)
    +     if (initJVM() != JOI_OK)
    +         return HBC_ERROR_INIT_PARAM;
     
    +  breadCrumb = 2;
    +  if (jenv_->PushLocalFrame(jniHandleCapacity_) != 0) {
    +     getExceptionDetails();
    +     return HBC_ERROR_ROWCOUNT_EST_EXCEPTION;
    +  }
    +  breadCrumb = 3;
    --- End diff --
    
    I know you wanted to provide breadcrumb for the errors reported. However, I 
would think you would want to consider leave this initialization code as it is. 
I have cleaned up the code earlier just to ensure that the initialization to 
the JNI layer is encapsulated in this method initJNIEnv() so that any changes 
can be made in one routine.


> Initial Update Stats on 22B row 2.5TB OE table gets 0 rowcount from 
> estimator, fails with timeouts by doing select count (*)
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: TRAFODION-2455
>                 URL: https://issues.apache.org/jira/browse/TRAFODION-2455
>             Project: Apache Trafodion
>          Issue Type: Bug
>          Components: sql-cmp
>    Affects Versions: 2.1-incubating
>         Environment: A cluster large enough to host a 22 billion row table
>            Reporter: David Wayne Birdsall
>            Assignee: David Wayne Birdsall
>
> When loading a scale factor 73728 Order Entry database, if UPDATE STATISTICS 
> is done soon after the load on one particular table (the largest table, 
> having 22 billion rows), we get the following failure:
> SQLEXCEPTION on Statement, Error Code = -9200
>    update statistics for table trafodion.javabench.oe_orderline_73728 on 
> every column, (OL_W_ID, OL_I_ID), (OL_D_ID, OL_W_ID), (OL_D_ID, OL_I_ID) 
> sample
> *** ERROR[9200] UPDATE STATISTICS for table 
> TRAFODION.JAVABENCH.OE_ORDERLINE_73728 encountered an error (8448) from 
> statement getRow(). [2017-01-09 02:07:22]
> *** ERROR[8448] Unable to access Hbase interface. Call to 
> ExpHbaseInterface::coProcAggr returned error HBASE_ACCESS_ERROR(-706). Cause: 
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after 
> attempts=3, exceptions:
> Mon Jan 09 01:47:21 PST 2017, 
> RpcRetryingCaller{globalStartTime=1483954641419, pause=100, retries=3}, 
> java.io.IOException: Call to nap015.esgyn.local/10.1.10.20:60020 failed on 
> local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call 
> id=73, waitTime=600001, operationTimeout=600000 expired.
> Mon Jan 09 01:57:21 PST 2017, 
> RpcRetryingCaller{globalStartTime=1483954641419, pause=100, retries=3}, 
> java.io.IOException: Call to nap015.esgyn.local/10.1.10.20:60020 failed on 
> local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call 
> id=185, waitTime=600001, operationTimeout=600000 expired.
> Mon Jan 09 02:07:22 PST 2017, 
> RpcRetryingCaller{globalStartTime=1483954641419, pause=100, retries=3}, 
> java.io.IOException: Call to nap015.esgyn.local/10.1.10.20:60020 failed on 
> local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call 
> id=310, waitTime=600001, operationTimeout=600000 expired.
> A subsequent update statistics command succeeds, but these failures take a 
> half hour or more.
> Enabling logging for update stats shows that getrowcount returns 0, so update 
> stats assumes the table is small enough to do a select count (*). The plan 
> for this select count (*) (perhaps suffering from the same issue that causes 
> getrowcount to return a non-estimate) chooses the HBase aggregate 
> coprocessor. The table in question has 22 billion rows, so the the 
> coprocessor isn't a good choice, and the query times out. But the real issue 
> is, why can't the table get a rowcount estimate.
> Rerunning UPDATE STATS on this table a few hours later succeeds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TRAFODION-2455) Initial Update Stats on 22B row 2.5TB OE table gets 0 rowcount from estimator, fails with timeouts by doing select count (*)

Reply via email to