[ 
https://issues.apache.org/jira/browse/HAWQ-255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15061673#comment-15061673
 ] 

ASF GitHub Bot commented on HAWQ-255:
-------------------------------------

Github user wangzw commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq/pull/191#discussion_r47877656
  
    --- Diff: src/backend/access/transam/xact.c ---
    @@ -2317,14 +2317,14 @@ CommitTransaction(void)
        willHaveObjectsFromSmgr =
                        
PersistentEndXactRec_WillHaveObjectsFromSmgr(EndXactRecKind_Commit);
     
    -   if (willHaveObjectsFromSmgr)
    -   {
    -           /*
    -            * We need to ensure the recording of the [distributed-]commit 
record and the
    -            * persistent post-commit work will be done either before or 
after a checkpoint.
    -            */
    -           CHECKPOINT_START_LOCK;
    -   }
    +   /* In previous version, we ensured the recording of the 
[distributed-]commit record and the
    +    * persistent post-commit work will be done either before or after a 
checkpoint.
    +    *
    +    * However the persistent table status will be synchronized with 
AOSeg_XXXX
    +    * table and hdfs file in PersistentRecovery_Scan() at recovery PASS2.
    +    * We don't need to worry about inconsistent states between them. So no
    +    * CHECKPOINT_START_LOCK any more.
    +    */
    --- End diff --
    
    Not true. Consider this case.
    
    1) flush commit record.
    2) start a checkpoint.
    3) checkpoint complete successfully.
    4) failed to drop file and or fail to modify persistent table.
    
    Since checkpoint was started (2) and finished (3) successfully after flush 
commit record (1), the commit record will be truncated during the checkpoint 
but failure (4) actually happened. I do not think recovery process can handle 
this case well.
     


> Checkpoint is blocked by TRANSACTION ABORT for INSERTING INTO a big partition 
> table
> -----------------------------------------------------------------------------------
>
>                 Key: HAWQ-255
>                 URL: https://issues.apache.org/jira/browse/HAWQ-255
>             Project: Apache HAWQ
>          Issue Type: Bug
>            Reporter: Ming LI
>            Assignee: Lei Chang
>
> If at the same time there are other INSERT commands running in parallel, it 
> will generates a lot of pg_xlog files. If at this time the system/master 
> nodes crashed, it will take a very long time for recovery.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to