[ 
https://issues.apache.org/jira/browse/TRAFODION-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15668833#comment-15668833
 ] 

ASF GitHub Bot commented on TRAFODION-2341:
-------------------------------------------

GitHub user DaveBirdsall opened a pull request:

    https://github.com/apache/incubator-trafodion/pull/836

    [TRAFODION-2341] Redesign UPDATE STATISTICS retry logic

    This set of changes revamps the retry logic used by UPDATE STATISTICS.
    
    The problem was that very often in Trafodion when a DDL or DML statement 
fails, the transaction is aborted. (In predecessor products, the statement 
itself often could be rolled back without aborting the transaction containing 
it.) The retry loop in HSFuncExecQuery did not take this into account, so most 
retries would ultimately fail with a confusing and uninformative error 8605 
(“Committing a transaction which has not started.”).
    
    Therefore, the retry logic has been redesigned. The logic to begin and 
commit transactions has been pushed down into the retry loop. The retry loop 
has been moved to a new function, HSFuncExecTransactionalQueryWithRetry, while 
the old function, HSFuncExecQuery, is now limited to non-retryable queries.
    
    While investigating and debugging this problem, I found several places in 
the code where a retry was being attempted but was not appropriate. In general, 
there were two kinds of issues. One was that retries were being done inside 
transactions having multiple statements. This is not appropriate because work 
done by earlier statements would be silently undone by retrying the current 
statement in a new transaction. The other is that some statements should not be 
retried. For example, an UPSERT USING LOAD with a SAMPLE clause is 
non-transactional, and therefore its effects are not necessarily rolled back by 
a transaction abort. The SAMPLE clause makes the set of data processed 
non-deterministic. So, retrying such a statement, e.g., while populating a 
sample table, risks generating more sample data than expected.
    
    Some changes were required in the use of the HSTranController object. This 
object is very elegant: It starts a transaction in its constructor and commits 
or rolls it back in its destructor. If transaction behavior matched up nicely 
with lexical scope, it would be perfect. Unfortunately, as described above for 
retries, it does not. So in places where retries are attempted, I had to remove 
use of this object.
    
    An awkward change was required in the use of the HSErrorCatcher object. 
This is another elegant object: It is used to move any CLI errors into the 
UPDATE STATISTICS ComDiagsArea at the end of a lexical scope. Unfortunately, if 
one makes a call from an HSErrorCatcher object scope to another method that 
also has an HSErrorCatcher object, any errors reported in the latter get 
reported twice. In the past, the usual practice has been to avoid doing this by 
carefully choosing the scopes for HSErrorCatcher objects. However, with the 
current changes, there is a recursion which makes this unavoidable. The 
HSFuncExecTransactionalQueryWithRetry function needs an HSErrorCatcher object 
to capture any ultimate errors after all retries have failed. However, it 
indirectly uses HSFuncExecQuery to do “BEGIN WORK” and “COMMIT WORK” (as it 
needs to manage transactions), and that function also has an HSErrorCatcher 
object. To avoid having transient transaction commit conflict errors (8616) 
being reported in the retry loop, we needed a mechanism to inactivate the 
latter HSErrorCatcher object. So, a flag has been added to its constructor that 
optionally tells it to turn itself off. Not very elegant, I know.
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/DaveBirdsall/incubator-trafodion Trafodion2341

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-trafodion/pull/836.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #836
    
----
commit a39e85b742ba1ee3a412267ea4655dd49d22601d
Author: Dave Birdsall <[email protected]>
Date:   2016-11-16T00:14:42Z

    [TRAFODION-2341] Redesign UPDATE STATISTICS retry logic

----


> UPDATE STATISTICS sometimes fails with error 8605
> -------------------------------------------------
>
>                 Key: TRAFODION-2341
>                 URL: https://issues.apache.org/jira/browse/TRAFODION-2341
>             Project: Apache Trafodion
>          Issue Type: Bug
>          Components: sql-cmp
>    Affects Versions: 2.1-incubating
>         Environment: All
>            Reporter: David Wayne Birdsall
>            Assignee: David Wayne Birdsall
>
> Update statistics sometimes fails with error 8605, Committing a transaction 
> which has not started, as in this example:
> SQL>update statistics for table mytable_47 on existing columns incremental 
> where b>=0;
> *** ERROR[9200] UPDATE STATISTICS for table 
> TRAFODION.UPDATESTATS_INCREMENTAL_NEW.MYTABLE_47 encountered an error (8605) 
> from statement Process_Query. [2016-10-13 12:20:16]
> *** ERROR[8605] Committing a transaction which has not started. [2016-10-13 
> 12:20:16]
> These failures are sporadic and non-deterministic. They can be reproduced by 
> running two streams containing SQL statements to create, load (e.g. using 
> UPSERT USING LOAD), update statistics persistent, inserts, and update 
> statistics incremental. In the example I am working with, the two streams 
> deal with separate sets of tables but in the same schema.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to