[
https://issues.apache.org/jira/browse/HAWQ-232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15044753#comment-15044753
]
Ming LI commented on HAWQ-232:
------------------------------
The hot spot is in the startup 3 phrase.
Most of the hang problem is occurs when a long load command which running
against a big partition table were killed, this load command will append some
the data to some existing files, and also it may create some new files for some
new virtual segments to execute INSERT command( because all existing files have
been used for other statements or for other virtual segments). And if this
INSERT statement crashed, then it will take a long time to redo the
create/drop hdfs files.
And I trace more info, in fact all persistent table state and relation dir and
relfile on hdfs are keep in consistent status at recovery PASS2.
---------------------------- log of 16385/16387/472145/5 at recovery PASS2
-----------------------
144449 2015-11-25 16:07:35.083979
CST,,,p32839,th-1553125376,,,,0,,,seg-10000,,,x1013,,"PANIC","XX000","Found
persistent file-system object in unexpected stat e 'Create Pending'
(cdbpersistentfilesysobj.c:1280)",,,,,"State change is for persistent 'Relation
File: '16385/16387/23281/2'' to go from states 'Drop Pending' or
'Aborting Create' to state 'Free' with serial number 19592 at TID (43,70)
144450 Dropping file-system object -- Relation File:
'16385/16387/23281/2'",,0,,"cdbpersistentfilesysobj.c",1280,"Stack trace:
144451 1 0x565925 postgres errstart + 0x3c4
144452 2 0x6ecc84 postgres PersistentFileSysObj_VerifyExpected + 0x323
144453 3 0x6ed0ac postgres PersistentFileSysObj_StateChange + 0x35a
144454 4 0x6db667 postgres PersistentRelfile_Dropped + 0x44e
144455 5 0x6ee1f6 postgres PersistentFileSysObj_DropObject + 0x56c
144456 6 0x6f3d86 postgres PersistentRecovery_DropType + 0x34c
144457 7 0x6f3e4d postgres PersistentRecovery_Drop + 0x66
144458 8 0x8a856 postgres StartupXLOG_Pass2 + 0x242
144459 9 0x90b5a postgres StartupProcessMain + 0x412
144460 10 0x121e5e postgres AuxiliaryProcessMain + 0x869
144461 11 0x411a3c postgres StartChildProcess + 0x187
144462 12 0x40d3c9 postgres do_reaper + 0x306
144463 13 0x409e26 postgres ServerLoop + 0x3d8
144464 14 0x408c32 postgres PostmasterMain + 0x1407
144465 15 0x32019f postgres main + 0x3ef
144466 16 0x2acd postgres _start + 0xd4
144467 17 0x29f8 postgres start + 0x28
However at the recovery PASS3, it will redo all relfile and relation dir
create/drop for on flight transaction:
---------------------------- log of 16385/16387/472145/5 at recovery PASS3
-----------------------
2015-11-24 12:40:57.991782
CST,,,p5526,th-1553125376,,,,0,,,seg-10000,,,,,"LOG","00000","Scan REDO: Add
Relation File: '16385/16387/4721455': state 'Create Pending' , transaction
1449, relation storage manager 'Append-Only', persistent serial number 165517,
TID (3,320)",,,,,,,0,,"cdbpersistentrecovery.c",968,
2015-11-24 12:40:59.367585
CST,,,p5526,th-1553125376,,,,0,,,seg-10000,,,,,"LOG","00000","Summary REDO:
Relation File: '16385/16387/4721455': state 'Create Pending' , transaction
1449, relation storage manager 'Append-Only', persistent serial number 165517,
TID (3,320)",,,,,,,0,,"cdbpersistentrecovery.c",884,
2015-11-24 12:40:59.878577
CST,,,p5526,th-1553125376,,,,0,,,seg-10000,,,,,"LOG","00000","Needs-Abort REDO:
(New state 'Aborting Create') Relation File: '16385/16387/4721455': state
'Create Pending' , transaction 1449, relation storage manager 'Append-Only',
persistent serial number 165517, TID
(3,320)",,,,,,,0,,"cdbpersistentrecovery.c",1417,
2015-11-24 12:41:00.450693
CST,,,p5526,th-1553125376,,,,0,,,seg-10000,,,x1449,,"LOG","00000","PersistentRecovery_CrashAbort:
Set state of Relation File: '16385/16387/4721455': state 'Create Pending' ,
transaction 1449, relation storage manager 'Append-Only', persistent serial
number 165517, TID (3,320) to 'Aborting
Create'",,,,,,,0,,"cdbpersistentrecovery.c",1525,
2015-11-24 12:41:01.549163
CST,,,p5526,th-1553125376,,,,0,,,seg-10000,,,x1449,,"LOG","00000","End-Xact-Update-Needed
REDO: Relation File: '16385/16387/4721455': state 'Aborting Create' ,
transaction 1449, relation storage manager 'Append-Only', persistent serial
number 165517, TID (3,320)",,,,,,,0,,"cdbpersistentrecovery.c",1609,
2015-11-24 12:41:08.117435
CST,,,p5526,th-1553125376,,,,0,,,seg-10000,,,x1449,,"LOG","00000","Drop REDO:
Relation File: '16385/16387/4721455': state 'Aborting Create' , transaction
1449, relation storage manager 'Append-Only', persistent serial number 165517,
TID (3,320)",,,,,,,0,,"cdbpersistentrecovery.c",1734,
2015-11-24 12:41:08.117472
CST,,,p5526,th-1553125376,,,,0,,,seg-10000,,,x1449,,"LOG","00000","PersistentFileSysObj_DropObject:
before drop of Relation File: '16385/16387/4721455', persistent serial number
165517, TID (3,320)",,,,,"Dropping file-system object -- Relation File:
'16385/16387/4721455'",,0,,"cdbpersistentfilesysobj.c",1965,
2015-11-24 12:41:08.117802
CST,,,p5526,th-1553125376,,,,0,,,seg-10000,,,x1449,,"LOG","00000","PersistentFileSysObj_DropObject:
after drop of Relation File: '16385/16387/4721455', persistent serial number
165517, TID (3,320)",,,,,"Dropping file-system object -- Relation File:
'16385/16387/4721455'",,0,,"cdbpersistentfilesysobj.c",1993,
2015-11-24 12:41:15.042114
CST,,,p5598,th-1553125376,,,,0,,,seg-10000,,,,,"LOG","00000","PersistentRecovery_ShouldHandlePass3XLogRec:
non-data change XLOG record",,,,,"REDO PASS 3 @ 1/5E1991DC; LSN 1/5E199614:
prev 1/5E199174; xid 1449: Master Mirror Log Records - create file: path
""hdfs://localhost:9000/hawq2/16385/16387/472145/5"", filespace
16384",,0,,"cdbpersistentrecovery.c",774,
2015-11-24 12:41:15.042162
CST,,,p5598,th-1553125376,,,,0,,,seg-10000,,,,,"LOG","00000","mmxlog_redo:
create file request 16: path
""hdfs://localhost:9000/hawq2/16385/16387/472145/5"", filespace 16384,
16385/16387/472145, path
""hdfs://localhost:9000/hawq2/16385/16387/472145/5""",,,,,"xlog redo create
file: path ""hdfs://localhost:9000/hawq2/16385/16387/472145/5"", filespace 16384
REDO PASS 3 @ 1/5E1991DC; LSN 1/5E199614: prev 1/5E199174; xid 1449: Master
Mirror Log Records - create file: path
""hdfs://localhost:9000/hawq2/16385/16387/472145/5"", filespace
16384",,0,,"xlog_mm.c",212,
2015-11-24 12:42:36.464366
CST,,,p5598,th-1553125376,,,,0,,,seg-10000,,,,,"LOG","00000","PersistentRecovery_ShouldHandlePass3XLogRec:
non-data change XLOG record",,,,,"REDO PASS 3 @ 1/608756E0; LSN 1/60875B18:
prev 1/60875678; xid 1449: Master Mirror Log Records - remove file: path
""hdfs://localhost:9000/hawq2/16385/16387/472145/5"", filespace
16384",,0,,"cdbpersistentrecovery.c",774,
2015-11-24 12:42:36.464422
CST,,,p5598,th-1553125376,,,,0,,,seg-10000,,,,,"LOG","00000","mmxlog_redo:
remove file request 48: path
""hdfs://localhost:9000/hawq2/16385/16387/472145/5"", filespace 16384,
16385/16387/472145, path
""hdfs://localhost:9000/hawq2/16385/16387/472145/5""",,,,,"xlog redo remove
file: path ""hdfs://localhost:9000/hawq2/16385/16387/472145/5"", filespace 16384
REDO PASS 3 @ 1/608756E0; LSN 1/60875B18: prev 1/60875678; xid 1449: Master
Mirror Log Records - remove file: path
""hdfs://localhost:9000/hawq2/16385/16387/472145/5"", filespace
16384",,0,,"xlog_mm.c",254,
So we need to pass redo all relfile and relation dir create/drop for on flight
transaction.
> Crash on INSERT into big partition table will cause a long time recovery
> ------------------------------------------------------------------------
>
> Key: HAWQ-232
> URL: https://issues.apache.org/jira/browse/HAWQ-232
> Project: Apache HAWQ
> Issue Type: Bug
> Components: Transaction
> Reporter: Ming LI
> Assignee: Lei Chang
>
> The sample testcase is below:
> DROP TABLE integer_part1;
> CREATE TABLE integer_part1 (a int, b int, c int)
> WITH (appendonly=true, compresslevel=5)
> partition by range (a)
> (
> partition aa start (0) end (1000) every (1)
> );
> CHECKPOINT;
> insert into integer_part1 select i,i,i from generate_series(0, 999) i;
> -- run shell: killall -9 postgres
> -- Then restart hawq.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)