[ 
https://issues.apache.org/jira/browse/HAWQ-323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086730#comment-15086730
 ] 

zharui commented on HAWQ-323:
-----------------------------

Just now, I found there are many core dump files in segmentdd directory, I 
guess once a heartbeat is sent a core dump file is generated. Then I ran gdb 
with one of the core dump files and I found the segmentation fault is happened 
in snedIMAlive function. The calling stack information as follows

#0  0x00000000008d3c61 in sendIMAlive ()
#1  0x0000000000903e20 in MainHandlerLoop_RMSEG ()
#2  0x000000000090412d in ResManagerMainSegment2ndPhase ()
#3  0x00000000009090de in ResManagerMain ()
#4  0x0000000000909511 in ResManagerProcessStartup ()
#5  0x00000000007654d8 in CommenceNormalOperations ()
#6  0x0000000000766284 in do_reaper ()
#7  0x000000000076b2be in ServerLoop ()
#8  0x000000000076ca36 in PostmasterMain ()
#9  0x00000000006c5dda in main ()

> Cannot query when cluster include more than 1 segment
> -----------------------------------------------------
>
>                 Key: HAWQ-323
>                 URL: https://issues.apache.org/jira/browse/HAWQ-323
>             Project: Apache HAWQ
>          Issue Type: Bug
>          Components: Core, Resource Manager
>    Affects Versions: 2.0.0-beta-incubating
>            Reporter: zharui
>            Assignee: Lei Chang
>
> The version I use is 2.0.0-beta-RC2. I can query data normally when cluster 
> just have 1 segment. Once the cluster have more then 1 segments online, I 
> cannot finish any query and being informed that "ERROR:  failed to acquire 
> resource from resource manager, 7 of 8 segments are unavailable 
> (pquery.c:788)".
> I have read the segment logs and the source code about resource manager. I 
> guess this issue is because of the communication failure between segment 
> instance and resource manager server. I can find the logs of the segment 
> connect to resource manager successfully such as "AsyncComm framework 
> receives message 518 from FD5" and "Resource enforcer increases memory quota 
> to: total memory quota=65536 MB, delta memory quota = 65536 MB", but the 
> other online segments have no these log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to