[ 
https://issues.apache.org/jira/browse/HAWQ-323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086723#comment-15086723
 ] 

zharui commented on HAWQ-323:
-----------------------------

Thank you for your reply.

Now another issue comes out. I clean the cluster and I want to let the cluster 
just have 2 nodes. When I finished cluster initialize, I found that the 
resourcemanager crash than restart again and again. Then I read the log and 
found resourcemanager segmentation fault error. The log as follows

2016-01-07 10:57:30.158627 
CST,,,p120379,th-336778880,,,,0,,,seg-10000,,,,,"LOG","00000","Segment resource 
manager discovers localhost configuration the first 
time.",,,,,,,0,,"requesthandler_RMSEG.c",74,
2016-01-07 10:57:30.158633 
CST,,,p120379,th-336778880,,,,0,,,seg-10000,,,,,"LOG","00000","Segment resource 
manager Build/update local host 
information.",,,,,,,0,,"requesthandler_RMSEG.c",123,
2016-01-07 10:57:30.158640 
CST,,,p120379,th-336778880,,,,0,,,seg-10000,,,,,"LOG","00000","Built localhost 
information. NODE:ID=-1,HAWQ AVAIL, GRM UNAVAIL, HAWQ CAP (65536 MB, 16.000000 
CORE), GRM CAP(0 MB, 0.000000 
CORE),NODE:HOST=ws01.mzhen.cn:40000,Master:0,Standby:0,Alive:1.Addresses:127.0.0.1,192.168.3.2,123.103.19.2,192.168.122.1",,,,,,,0,,"requesthandler_RMSEG.c",201,
2016-01-07 10:57:33.164831 
CST,,,p113549,th-336778880,,,,0,,,seg-10000,,,,,"LOG","00000","resourcemanager 
process (PID 120379) was terminated by signal 11: Segmentation 
fault",,,,,,,0,,"postmaster.c",4714,
2016-01-07 10:57:33.164876 
CST,,,p113549,th-336778880,,,,0,,,seg-10000,,,,,"LOG","00000","server process 
(PID 120379) was terminated by signal 11: Segmentation 
fault",,,,,,,0,,"postmaster.c",4714,
2016-01-07 10:57:33.164885 
CST,,,p113549,th-336778880,,,,0,,,seg-10000,,,,,"LOG","00000","terminating any 
other active server processes",,,,,,,0,,"postmaster.c",4452,
2016-01-07 10:57:33.165428 
CST,,,p113549,th-336778880,,,,0,,,seg-10000,,,,,"LOG","00000","BeginResetOfPostmasterAfterChildrenAreShutDown:
 counter 259",,,,,,,0,,"postmaster.c",1868,
2016-01-07 10:57:33.165437 
CST,,,p113549,th-336778880,,,,0,,,seg-10000,,,,,"LOG","00000","gp_session_id 
high-water mark is 1",,,,,,,0,,"postmaster.c",1892,
2016-01-07 10:57:33.203150 
CST,,,p113549,th-336778880,,,,0,,,seg-10000,,,,,"LOG","00000","resetting shared 
memory",,,,,,,0,,"postmaster.c",3300,
2016-01-07 10:57:33.203173 
CST,,,p113549,th-336778880,,,,0,,,seg-10000,,,,,"LOG","00000","PersistentFilespace_ShmemSize:
 69192 = gp_max_filespaces: 8 * sizeof(FilespaceDirEntryData): 1048 + 
PersistentFilespace_SharedDataSize(): 
80",,,,,,,0,,"cdbpersistentfilespace.c",1144,
2016-01-07 10:57:33.203182 
CST,,,p113549,th-336778880,,,,0,,,seg-10000,,,,,"LOG","00000","PersistentTablespace_ShmemSize:
 6304 = gp_max_tablespaces: 16 * sizeof(TablespaceDirEntryData): 32 + 
PersistentTablespace_SharedDataSize(): 
80",,,,,,,0,,"cdbpersistenttablespace.c",1192,
2016-01-07 10:57:33.203190 
CST,,,p113549,th-336778880,,,,0,,,seg-10000,,,,,"LOG","00000","PersistentDatabase_ShmemSize:
 15984 = PersistentDatabase_SharedDataSize(): 15984 = 
PersistentDatabaseSharedData: 80 + MaxPersistentDatabaseDirectories: 256 (db: 
16 * ts: 16) * sizeof(DatabaseDirEntryData): 
56",,,,,,,0,,"cdbpersistentdatabase.c",1477,
2016-01-07 10:57:33.203199 
CST,,,p113549,th-336778880,,,,0,,,seg-10000,,,,,"LOG","00000","PersistentRelation_ShmemSize:
 3673504 = gp_max_relations: 65536 * sizeof(RelationDirEntryData): 32 + 
PersistentRelation_SharedDataSize(): 80",,,,,,,0,,"cdbpersistentrelation.c",454,
2016-01-07 10:57:33.203207 
CST,,,p113549,th-336778880,,,,0,,,seg-10000,,,,,"LOG","00000","Metadata Cache 
Share Memory Size : 155720180",,,,,,,0,,"ipci.c",184,
2016-01-07 10:57:33.203482 
CST,,,p113549,th-336778880,,,,0,,,seg-10000,,,,,"LOG","00000","temporary files 
using default location",,,,,,,0,,"primary_mirror_mode.c",282,
2016-01-07 10:57:33.203490 
CST,,,p113549,th-336778880,,,,0,,,seg-10000,,,,,"LOG","00000","transaction 
files using default pg_system filespace",,,,,,,0,,"primary_mirror_mode.c",1133,
2016-01-07 10:57:33.346770 
CST,,,p113549,th-336778880,,,,0,,,seg-10000,,,,,"LOG","00000","[MetadataCache] 
Metadata cache initialize successfully. 
block_capacity:2097152",,,,,,,0,,"cdbmetadatacache.c",248,
2016-01-07 10:57:33.351796 
CST,,,p113549,th-336778880,,,,0,,,seg-10000,,,,,"LOG","00000","PrimaryMirrorMode:
 Processing postmaster reset with recent mode of 
3",,,,,,,0,,"primary_mirror_mode.c",877,
2016-01-07 10:57:33.351836 
CST,,,p113549,th-336778880,,,,0,,,seg-10000,,,,,"LOG","00000","PrimaryMirrorMode:
 Processing postmaster reset to non-fault 
state",,,,,,,0,,"primary_mirror_mode.c",885,
2016-01-07 10:57:33.351844 
CST,,,p113549,th-336778880,,,,0,,,seg-10000,,,,,"LOG","00000","removing all 
temporary files",,,,,,,0,,"fd.c",2123,
2016-01-07 10:57:33.351852 
CST,,,p113549,th-336778880,,,,0,,,seg-10000,,,,,"LOG","00000","all server 
processes terminated; reinitializing",,,,,,,0,,"postmaster.c",1909,
2016-01-07 10:57:33.353265 
CST,,,p120467,th-336778880,,,,0,,,seg-10000,,,,,"LOG","00000","database system 
was interrupted at 2016-01-07 10:57:29 CST",,,,,,,0,,"xlog.c",6221,
2016-01-07 10:57:33.363626 
CST,,,p120467,th-336778880,,,,0,,,seg-10000,,,,,"LOG","00000","checkpoint 
record is at 0/92D6D0",,,,,,,0,,"xlog.c",6298,
2016-01-07 10:57:33.363678 
CST,,,p120467,th-336778880,,,,0,,,seg-10000,,,,,"LOG","00000","redo record is 
at 0/92D6D0; undo record is at 0/0; shutdown TRUE",,,,,,,0,,"xlog.c",6332,
2016-01-07 10:57:33.363711 
CST,,,p120467,th-336778880,,,,0,,,seg-10000,,,,,"LOG","00000","next transaction 
ID: 0/972; next OID: 10996",,,,,,,0,,"xlog.c",6336,
2016-01-07 10:57:33.363743 
CST,,,p120467,th-336778880,,,,0,,,seg-10000,,,,,"LOG","00000","next 
MultiXactId: 1; next MultiXactOffset: 0",,,,,,,0,,"xlog.c",6339,
2016-01-07 10:57:33.363772 
CST,,,p120467,th-336778880,,,,0,,,seg-10000,,,,,"LOG","00000","database system 
was not properly shut down; automatic recovery in 
progress",,,,,,,0,,"xlog.c",6428,
2016-01-07 10:57:33.422160 
CST,,,p120467,th-336778880,,,,0,,,seg-10000,,,,,"LOG","00000","record with zero 
length at 0/92D720",,,,,,,0,,"xlog.c",4108,
2016-01-07 10:57:33.422184 
CST,,,p120467,th-336778880,,,,0,,,seg-10000,,,,,"LOG","00000","no record for 
redo after checkpoint, skip redo and proceed for recovery 
pass",,,,,,,0,,"xlog.c",6492,
2016-01-07 10:57:33.422207 
CST,,,p120467,th-336778880,,,,0,,,seg-10000,,,,,"LOG","00000","end of 
transaction log location is 0/92D720",,,,,,,0,,"xlog.c",6576,
2016-01-07 10:57:33.422743 
CST,,,p120467,th-336778880,,,,0,,,seg-10000,,,,,"LOG","00000","Finished startup 
pass 1.  Proceeding to startup crash recovery passes 2 and 
3.",,,,,,,0,,"xlog.c",6810,
2016-01-07 10:57:33.427194 
CST,,,p120468,th-336778880,,,,0,,,seg-10000,,,,,"LOG","00000","Finished startup 
crash recovery pass 2",,,,,,,0,,"xlog.c",6981,
2016-01-07 10:57:33.505649 
CST,,,p120469,th-336778880,,,,0,,,seg-10000,,,,,"LOG","00000","recovery restart 
point at 0/92D6D0",,,,,"xlog redo checkpoint: redo 0/92D6D0; undo 0/0; tli 1; 
xid 0/972; oid 10996; multi 1; offset 0; shutdown
REDO PASS 3 @ 0/92D6D0; LSN 0/92D720: prev 0/92D680; xid 0: XLOG - checkpoint: 
redo 0/92D6D0; undo 0/0; tli 1; xid 0/972; oid 10996; multi 1; offset 0; 
shutdown",,0,,"xlog.c",8317,
2016-01-07 10:57:33.505711 
CST,,,p120469,th-336778880,,,,0,,,seg-10000,,,,,"LOG","00000","record with zero 
length at 0/92D720",,,,,,,0,,"xlog.c",4108,
2016-01-07 10:57:33.505720 
CST,,,p120469,th-336778880,,,,0,,,seg-10000,,,,,"LOG","00000","Oldest active 
transaction from prepared transactions 972",,,,,,,0,,"xlog.c",5990,
2016-01-07 10:57:33.820605 
CST,,,p120469,th-336778880,,,,0,,,seg-10000,,,,,"LOG","00000","database system 
is ready",,,,,,,0,,"xlog.c",6016,
2016-01-07 10:57:33.820642 
CST,,,p120469,th-336778880,,,,0,,,seg-10000,,,,,"LOG","00000","PostgreSQL 
8.2.15 (Greenplum Database 4.2.0 build 1) (HAWQ 2.0.0.0_beta build dev) on 
x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.7 20120313 (Red Hat 
4.4.7-16) compiled on Jan  6 2016 15:41:35",,,,,,,0,,"xlog.c",6026,
2016-01-07 10:57:33.916443 
CST,,,p120469,th-336778880,,,,0,,,seg-10000,,,,,"LOG","00000","Finished startup 
crash recovery pass 3",,,,,,,0,,"xlog.c",7125,
2016-01-07 10:57:33.922753 
CST,,,p120495,th-336778880,,,,0,,,seg-10000,,,,,"LOG","00000","Finished startup 
integrity checking",,,,,,,0,,"xlog.c",7153,
2016-01-07 10:57:33.926302 
CST,,,p120499,th-336778880,,,,0,,,seg-10000,,,,,"LOG","00000","HAWQ Master RM 
:: Temporary directory /tmp",,,,,,,0,,"resourcemanager.c",1060,
2016-01-07 10:57:33.926318 
CST,,,p120499,th-336778880,,,,0,,,seg-10000,,,,,"LOG","00000","HAWQ Segment RM 
:: Temporary directory /tmp",,,,,,,0,,"resourcemanager.c",1067,
2016-01-07 10:57:33.933475 
CST,,,p120499,th-336778880,,,,0,,,seg-10000,,,,,"LOG","00000","HAWQ RM SEG 
process works now.",,,,,,,0,,"resourcemanager_RMSEG.c",71,
2016-01-07 10:57:33.933534 
CST,,,p113549,th-336778880,,,,0,,,seg-10000,,,,,"LOG","00000","Wait for HAWQ RM 
-1",,,,,,,0,,"resourcemanager.c",419,
2016-01-07 10:57:33.933569 
CST,,,p113549,th-336778880,,,,0,,,seg-10000,,,,,"LOG","00000","HAWQ :: Received 
signal notification that HAWQ RM works now.",,,,,,,0,,"resourcemanager.c",427,
2016-01-07 10:57:33.933579 
CST,,,p113549,th-336778880,,,,0,,,seg-10000,,,,,"LOG","00000","PostgreSQL 
8.2.15 (Greenplum Database 4.2.0 build 1) (HAWQ 2.0.0.0_beta build dev) on 
x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.7 20120313 (Red Hat 
4.4.7-16) compiled on Jan  6 2016 18:31:33",,,,,,,0,,"postmaster.c",3660,
2016-01-07 10:57:33.933589 
CST,,,p113549,th-336778880,,,,0,,,seg-10000,,,,,"LOG","00000","database system 
is ready to accept connections","PostgreSQL 8.2.15 (Greenplum Database 4.2.0 
build 1) (HAWQ 2.0.0.0_beta build dev) on x86_64-unknown-linux-gnu, compiled by 
GCC gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-16) compiled on Jan  6 2016 
18:31:33",,,,,,0,,"postmaster.c",3667,
2016-01-07 10:57:34.034510 
CST,,,p120499,th-336778880,,,,0,,,seg-10000,,,,,"LOG","00000","Resource manager 
discovered local host IPv4 address 127.0.0.1",,,,,,,0,,"network_utils.c",245,
2016-01-07 10:57:34.034531 
CST,,,p120499,th-336778880,,,,0,,,seg-10000,,,,,"LOG","00000","Resource manager 
discovered local host IPv4 address 192.168.3.2",,,,,,,0,,"network_utils.c",245,
2016-01-07 10:57:34.034538 
CST,,,p120499,th-336778880,,,,0,,,seg-10000,,,,,"LOG","00000","Resource manager 
discovered local host IPv4 address 123.103.19.2",,,,,,,0,,"network_utils.c",245,
2016-01-07 10:57:34.034550 
CST,,,p120499,th-336778880,,,,0,,,seg-10000,,,,,"LOG","00000","Resource manager 
discovered local host IPv4 address 
192.168.122.1",,,,,,,0,,"network_utils.c",245,
2016-01-07 10:57:34.034558 
CST,,,p120499,th-336778880,,,,0,,,seg-10000,,,,,"LOG","00000","Segment resource 
manager discovers localhost configuration the first 
time.",,,,,,,0,,"requesthandler_RMSEG.c",74,
2016-01-07 10:57:34.034564 
CST,,,p120499,th-336778880,,,,0,,,seg-10000,,,,,"LOG","00000","Segment resource 
manager Build/update local host 
information.",,,,,,,0,,"requesthandler_RMSEG.c",123,
2016-01-07 10:57:34.034571 
CST,,,p120499,th-336778880,,,,0,,,seg-10000,,,,,"LOG","00000","Built localhost 
information. NODE:ID=-1,HAWQ AVAIL, GRM UNAVAIL, HAWQ CAP (65536 MB, 16.000000 
CORE), GRM CAP(0 MB, 0.000000 
CORE),NODE:HOST=ws01.mzhen.cn:40000,Master:0,Standby:0,Alive:1.Addresses:127.0.0.1,192.168.3.2,123.103.19.2,192.168.122.1",,,,,,,0,,"requesthandler_RMSEG.c",201,
2016-01-07 10:57:34.040028 
CST,,,p113549,th-336778880,,,,0,,,seg-10000,,,,,"LOG","00000","resourcemanager 
process (PID 120499) was terminated by signal 11: Segmentation 
fault",,,,,,,0,,"postmaster.c",4714,
2016-01-07 10:57:34.040069 
CST,,,p113549,th-336778880,,,,0,,,seg-10000,,,,,"LOG","00000","server process 
(PID 120499) was terminated by signal 11: Segmentation 
fault",,,,,,,0,,"postmaster.c",4714,
2016-01-07 10:57:34.040101 
CST,,,p113549,th-336778880,,,,0,,,seg-10000,,,,,"LOG","00000","terminating any 
other active server processes",,,,,,,0,,"postmaster.c",4452,
2016-01-07 10:57:34.040608 
CST,,,p113549,th-336778880,,,,0,,,seg-10000,,,,,"LOG","00000","BeginResetOfPostmasterAfterChildrenAreShutDown:
 counter 260",,,,,,,0,,"postmaster.c",1868,
2016-01-07 10:57:34.040627 
CST,,,p113549,th-336778880,,,,0,,,seg-10000,,,,,"LOG","00000","gp_session_id 
high-water mark is 1",,,,,,,0,,"postmaster.c",1892,
2016-01-07 10:57:34.069838 
CST,,,p113549,th-336778880,,,,0,,,seg-10000,,,,,"LOG","00000","resetting shared 
memory",,,,,,,0,,"postmaster.c",3300,


> Cannot query when cluster include more than 1 segment
> -----------------------------------------------------
>
>                 Key: HAWQ-323
>                 URL: https://issues.apache.org/jira/browse/HAWQ-323
>             Project: Apache HAWQ
>          Issue Type: Bug
>          Components: Core, Resource Manager
>    Affects Versions: 2.0.0-beta-incubating
>            Reporter: zharui
>            Assignee: Lei Chang
>
> The version I use is 2.0.0-beta-RC2. I can query data normally when cluster 
> just have 1 segment. Once the cluster have more then 1 segments online, I 
> cannot finish any query and being informed that "ERROR:  failed to acquire 
> resource from resource manager, 7 of 8 segments are unavailable 
> (pquery.c:788)".
> I have read the segment logs and the source code about resource manager. I 
> guess this issue is because of the communication failure between segment 
> instance and resource manager server. I can find the logs of the segment 
> connect to resource manager successfully such as "AsyncComm framework 
> receives message 518 from FD5" and "Resource enforcer increases memory quota 
> to: total memory quota=65536 MB, delta memory quota = 65536 MB", but the 
> other online segments have no these log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to