[
https://issues.apache.org/jira/browse/TRAFODION-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15668833#comment-15668833
]
ASF GitHub Bot commented on TRAFODION-2341:
-------------------------------------------
GitHub user DaveBirdsall opened a pull request:
https://github.com/apache/incubator-trafodion/pull/836
[TRAFODION-2341] Redesign UPDATE STATISTICS retry logic
This set of changes revamps the retry logic used by UPDATE STATISTICS.
The problem was that very often in Trafodion when a DDL or DML statement
fails, the transaction is aborted. (In predecessor products, the statement
itself often could be rolled back without aborting the transaction containing
it.) The retry loop in HSFuncExecQuery did not take this into account, so most
retries would ultimately fail with a confusing and uninformative error 8605
(“Committing a transaction which has not started.”).
Therefore, the retry logic has been redesigned. The logic to begin and
commit transactions has been pushed down into the retry loop. The retry loop
has been moved to a new function, HSFuncExecTransactionalQueryWithRetry, while
the old function, HSFuncExecQuery, is now limited to non-retryable queries.
While investigating and debugging this problem, I found several places in
the code where a retry was being attempted but was not appropriate. In general,
there were two kinds of issues. One was that retries were being done inside
transactions having multiple statements. This is not appropriate because work
done by earlier statements would be silently undone by retrying the current
statement in a new transaction. The other is that some statements should not be
retried. For example, an UPSERT USING LOAD with a SAMPLE clause is
non-transactional, and therefore its effects are not necessarily rolled back by
a transaction abort. The SAMPLE clause makes the set of data processed
non-deterministic. So, retrying such a statement, e.g., while populating a
sample table, risks generating more sample data than expected.
Some changes were required in the use of the HSTranController object. This
object is very elegant: It starts a transaction in its constructor and commits
or rolls it back in its destructor. If transaction behavior matched up nicely
with lexical scope, it would be perfect. Unfortunately, as described above for
retries, it does not. So in places where retries are attempted, I had to remove
use of this object.
An awkward change was required in the use of the HSErrorCatcher object.
This is another elegant object: It is used to move any CLI errors into the
UPDATE STATISTICS ComDiagsArea at the end of a lexical scope. Unfortunately, if
one makes a call from an HSErrorCatcher object scope to another method that
also has an HSErrorCatcher object, any errors reported in the latter get
reported twice. In the past, the usual practice has been to avoid doing this by
carefully choosing the scopes for HSErrorCatcher objects. However, with the
current changes, there is a recursion which makes this unavoidable. The
HSFuncExecTransactionalQueryWithRetry function needs an HSErrorCatcher object
to capture any ultimate errors after all retries have failed. However, it
indirectly uses HSFuncExecQuery to do “BEGIN WORK” and “COMMIT WORK” (as it
needs to manage transactions), and that function also has an HSErrorCatcher
object. To avoid having transient transaction commit conflict errors (8616)
being reported in the retry loop, we needed a mechanism to inactivate the
latter HSErrorCatcher object. So, a flag has been added to its constructor that
optionally tells it to turn itself off. Not very elegant, I know.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/DaveBirdsall/incubator-trafodion Trafodion2341
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-trafodion/pull/836.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #836
----
commit a39e85b742ba1ee3a412267ea4655dd49d22601d
Author: Dave Birdsall <[email protected]>
Date: 2016-11-16T00:14:42Z
[TRAFODION-2341] Redesign UPDATE STATISTICS retry logic
----
> UPDATE STATISTICS sometimes fails with error 8605
> -------------------------------------------------
>
> Key: TRAFODION-2341
> URL: https://issues.apache.org/jira/browse/TRAFODION-2341
> Project: Apache Trafodion
> Issue Type: Bug
> Components: sql-cmp
> Affects Versions: 2.1-incubating
> Environment: All
> Reporter: David Wayne Birdsall
> Assignee: David Wayne Birdsall
>
> Update statistics sometimes fails with error 8605, Committing a transaction
> which has not started, as in this example:
> SQL>update statistics for table mytable_47 on existing columns incremental
> where b>=0;
> *** ERROR[9200] UPDATE STATISTICS for table
> TRAFODION.UPDATESTATS_INCREMENTAL_NEW.MYTABLE_47 encountered an error (8605)
> from statement Process_Query. [2016-10-13 12:20:16]
> *** ERROR[8605] Committing a transaction which has not started. [2016-10-13
> 12:20:16]
> These failures are sporadic and non-deterministic. They can be reproduced by
> running two streams containing SQL statements to create, load (e.g. using
> UPSERT USING LOAD), update statistics persistent, inserts, and update
> statistics incremental. In the example I am working with, the two streams
> deal with separate sets of tables but in the same schema.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)