[ https://issues.apache.org/jira/browse/GIRAPH-904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14014247#comment-14014247 ]
Jaeho Shin commented on GIRAPH-904: ----------------------------------- The exact code where Giraph hangs is {{BspServiceMaster#barrierOnWorkerList()}}. It keeps printing log lines starting with {{"barrierOnWorkerList: Waiting on "}} with hostnames that has definitely finished, e.g., when launched with a single worker, it waits for itself after finishing the loading. On a second look, I think a more precise fix would be doing the lowercase normalization in that function when computing the {{hostnameIdList}} rather than doing it in {{GraphConfiguration#getLocalHostname()}}. The case difference between that and the hostname returned by {{TaskInfo#getHostnameId()}} is causing the set difference computation in the while loop to fail. > Giraph can hang when hostnames include uppercase letters > -------------------------------------------------------- > > Key: GIRAPH-904 > URL: https://issues.apache.org/jira/browse/GIRAPH-904 > Project: Giraph > Issue Type: Bug > Components: bsp, conf and scripts, zookeeper > Affects Versions: 1.1.0 > Reporter: Jaeho Shin > Fix For: 1.1.0 > > Attachments: GIRAPH-904.patch > > > We found that Giraph jobs were consistently hanging if uppercase letters were > included in the DNS (or /etc/hosts) resolved hostnames ({{foo.stanford.edu}} > vs. {{foo.Stanford.EDU}} from our DNS). Normalizing the hostnames to lower > case from {{GiraphConfiguration#getLocalHostname()}} fixed our problem. -- This message was sent by Atlassian JIRA (v6.2#6252)