[
https://issues.apache.org/jira/browse/HAWQ-255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15063347#comment-15063347
]
ASF GitHub Bot commented on HAWQ-255:
-------------------------------------
Github user liming01 commented on a diff in the pull request:
https://github.com/apache/incubator-hawq/pull/191#discussion_r47990910
--- Diff: src/backend/access/transam/xact.c ---
@@ -2317,14 +2317,14 @@ CommitTransaction(void)
willHaveObjectsFromSmgr =
PersistentEndXactRec_WillHaveObjectsFromSmgr(EndXactRecKind_Commit);
- if (willHaveObjectsFromSmgr)
- {
- /*
- * We need to ensure the recording of the [distributed-]commit
record and the
- * persistent post-commit work will be done either before or
after a checkpoint.
- */
- CHECKPOINT_START_LOCK;
- }
+ /* In previous version, we ensured the recording of the
[distributed-]commit record and the
+ * persistent post-commit work will be done either before or after a
checkpoint.
+ *
+ * However the persistent table status will be synchronized with
AOSeg_XXXX
+ * table and hdfs file in PersistentRecovery_Scan() at recovery PASS2.
+ * We don't need to worry about inconsistent states between them. So no
+ * CHECKPOINT_START_LOCK any more.
+ */
--- End diff --
The status of current transaction will not be truncated at (4), the
truncate related code is:
CreateCheckPoint() --> CheckPointGuts() --> CheckPointMultiXact() ->
TruncateMultiXact() -> SimpleLruTruncate()
In this function, cutoffPage is based on the min value of
OldestMemberMXactId and OldestVisibleMXactId for all backends' proc.
And OldestMemberMXactId and OldestVisibleMXactId for current backend's proc
are reset in
AtEOXact_MultiXact(). In AbortTransaction() and CommitTransaction(), this
function is called after AtEOXact_smgr().
So in recovery process, we can know that the current transaction has
already committed, and all persistent table and hdfs files should be redo.
> Checkpoint is blocked by TRANSACTION ABORT for INSERTING INTO a big partition
> table
> -----------------------------------------------------------------------------------
>
> Key: HAWQ-255
> URL: https://issues.apache.org/jira/browse/HAWQ-255
> Project: Apache HAWQ
> Issue Type: Bug
> Reporter: Ming LI
> Assignee: Lei Chang
>
> If at the same time there are other INSERT commands running in parallel, it
> will generates a lot of pg_xlog files. If at this time the system/master
> nodes crashed, it will take a very long time for recovery.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)