[
https://issues.apache.org/jira/browse/GIRAPH-601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13617924#comment-13617924
]
Eugene Koontz commented on GIRAPH-601:
--------------------------------------
Thanks for the feedback Eli! This is where I was waylaid while hoping to take a
look at GIRAPH-13. Once I get this sorted out I hope to continue with the
latter!
Here's some output enhanced with taskId information for each worker and master:
{code}
ekoontz@Eugenes-MacBook-Pro ~$ perl format.pl < postprocessed.txt
application_1364578380737_0027/container_1364578380737_0027_01_000002/syslog
=========
INFO [main] org.apache.giraph.graph.GraphTaskManager: setup: Log level remains
at info
INFO [main] org.apache.giraph.graph.GraphTaskManager: Distributed cache is
empty. Assuming fatjar.
INFO [main] org.apache.giraph.graph.GraphTaskManager: setup: classpath @
/tmp/hadoop-yarn/staging/ekoontz/.staging/job_1364578380737_0027/job.jar for
job org.apache.giraph.benchmark.PageRankBenchmark
DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker: true
DEBUG [main] org.apache.giraph.graph.GraphTaskManager: taskPartition: 0
DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker is
true.
DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker is
true and zkAlreadyProvided=true.
DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker is
true and taskPartition (0) is less than masterCount (1), so MASTER_ONLY.
INFO [main] org.apache.giraph.graph.GraphTaskManager: setup: Starting up
BspServiceMaster (master thread)...
INFO [main] org.apache.zookeeper.ZooKeeper: Initiating client connection,
connectString=localhost:2181 sessionTimeout=60000
watcher=org.apache.giraph.master.BspServiceMaster@47875da7
INFO [main] org.apache.giraph.graph.GraphTaskManager: map: No need to do
anything when not a worker
INFO [main] org.apache.giraph.graph.GraphTaskManager: cleanup: Starting for
MASTER_ONLY
INFO [org.apache.giraph.master.MasterThread]
org.apache.giraph.master.BspServiceMaster: becomeMaster: First child is
'/_hadoopBsp/job_1364578380737_0027/_masterElectionDir/eugenes-macbook-pro.local_00000000000'
and my bid is
'/_hadoopBsp/job_1364578380737_0027/_masterElectionDir/eugenes-macbook-pro.local_00000000000'
INFO [org.apache.giraph.master.MasterThread]
org.apache.giraph.comm.netty.NettyServer: NettyServer: Using execution handler
with 8 threads after requestFrameDecoder.
WARN [org.apache.giraph.master.MasterThread]
org.apache.hadoop.conf.Configuration: mapred.map.tasks is deprecated. Instead,
use mapreduce.job.maps
INFO [org.apache.giraph.master.MasterThread]
org.apache.giraph.comm.netty.NettyServer: start: Started server communication
server: Eugenes-MacBook-Pro.local/172.16.175.1:30000 with up to 16 threads on
bind attempt 0 with sendBufferSize = 32768 receiveBufferSize = 524288 backlog =
6
INFO [org.apache.giraph.master.MasterThread]
org.apache.giraph.comm.netty.NettyClient: NettyClient: Using execution handler
with 8 threads after requestEncoder.
INFO [org.apache.giraph.master.MasterThread]
org.apache.giraph.master.BspServiceMaster: becomeMaster: I am now the master!
DEBUG [org.apache.giraph.master.MasterThread]
org.apache.giraph.master.BspServiceMaster: checkWorkers: Got event that health
registration changed, not using poll attempt
DEBUG [org.apache.giraph.master.MasterThread]
org.apache.giraph.master.BspServiceMaster: checkWorkers: Got event that health
registration changed, not using poll attempt
DEBUG [org.apache.giraph.master.MasterThread]
org.apache.giraph.master.BspServiceMaster: checkWorkers: Got event that health
registration changed, not using poll attempt
DEBUG [org.apache.giraph.master.MasterThread]
org.apache.giraph.master.BspServiceMaster: checkWorkers: Got event that health
registration changed, not using poll attempt
DEBUG [org.apache.giraph.master.MasterThread]
org.apache.giraph.master.BspServiceMaster: checkWorkers: Got event that health
registration changed, not using poll attempt
INFO [org.apache.giraph.master.MasterThread]
org.apache.giraph.master.BspServiceMaster: checkWorkers: Only found 5 responses
of 6 needed to start superstep -1. Reporting every 30000 msecs, 565085 more
msecs left before giving up.
INFO [org.apache.giraph.master.MasterThread]
org.apache.giraph.master.BspServiceMaster: logMissingWorkersOnSuperstep: No
response from partition 6 (could be master)
INFO [org.apache.giraph.master.MasterThread]
org.apache.giraph.master.BspServiceMaster: checkWorkers: Only found 5 responses
of 6 needed to start superstep -1. Reporting every 30000 msecs, 535076 more
msecs left before giving up.
INFO [org.apache.giraph.master.MasterThread]
org.apache.giraph.master.BspServiceMaster: logMissingWorkersOnSuperstep: No
response from partition 6 (could be master)
INFO [org.apache.giraph.master.MasterThread]
org.apache.giraph.master.BspServiceMaster: checkWorkers: Only found 5 responses
of 6 needed to start superstep -1. Reporting every 30000 msecs, 505068 more
msecs left before giving up.
INFO [org.apache.giraph.master.MasterThread]
org.apache.giraph.master.BspServiceMaster: logMissingWorkersOnSuperstep: No
response from partition 6 (could be master)
INFO [org.apache.giraph.master.MasterThread]
org.apache.giraph.master.BspServiceMaster: checkWorkers: Only found 5 responses
of 6 needed to start superstep -1. Reporting every 30000 msecs, 475059 more
msecs left before giving up.
INFO [org.apache.giraph.master.MasterThread]
org.apache.giraph.master.BspServiceMaster: logMissingWorkersOnSuperstep: No
response from partition 6 (could be master)
INFO [org.apache.giraph.master.MasterThread]
org.apache.giraph.master.BspServiceMaster: checkWorkers: Only found 5 responses
of 6 needed to start superstep -1. Reporting every 30000 msecs, 445051 more
msecs left before giving up.
INFO [org.apache.giraph.master.MasterThread]
org.apache.giraph.master.BspServiceMaster: logMissingWorkersOnSuperstep: No
response from partition 6 (could be master)
INFO [org.apache.giraph.master.MasterThread]
org.apache.giraph.master.BspServiceMaster: checkWorkers: Only found 5 responses
of 6 needed to start superstep -1. Reporting every 30000 msecs, 415042 more
msecs left before giving up.
INFO [org.apache.giraph.master.MasterThread]
org.apache.giraph.master.BspServiceMaster: logMissingWorkersOnSuperstep: No
response from partition 6 (could be master)
INFO [org.apache.giraph.master.MasterThread]
org.apache.giraph.master.BspServiceMaster: checkWorkers: Only found 5 responses
of 6 needed to start superstep -1. Reporting every 30000 msecs, 385033 more
msecs left before giving up.
INFO [org.apache.giraph.master.MasterThread]
org.apache.giraph.master.BspServiceMaster: logMissingWorkersOnSuperstep: No
response from partition 6 (could be master)
INFO [org.apache.giraph.master.MasterThread]
org.apache.giraph.master.BspServiceMaster: checkWorkers: Only found 5 responses
of 6 needed to start superstep -1. Reporting every 30000 msecs, 355024 more
msecs left before giving up.
INFO [org.apache.giraph.master.MasterThread]
org.apache.giraph.master.BspServiceMaster: logMissingWorkersOnSuperstep: No
response from partition 6 (could be master)
INFO [org.apache.giraph.master.MasterThread]
org.apache.giraph.master.BspServiceMaster: checkWorkers: Only found 5 responses
of 6 needed to start superstep -1. Reporting every 30000 msecs, 325016 more
msecs left before giving up.
INFO [org.apache.giraph.master.MasterThread]
org.apache.giraph.master.BspServiceMaster: logMissingWorkersOnSuperstep: No
response from partition 6 (could be master)
INFO [org.apache.giraph.master.MasterThread]
org.apache.giraph.master.BspServiceMaster: checkWorkers: Only found 5 responses
of 6 needed to start superstep -1. Reporting every 30000 msecs, 295008 more
msecs left before giving up.
INFO [org.apache.giraph.master.MasterThread]
org.apache.giraph.master.BspServiceMaster: logMissingWorkersOnSuperstep: No
response from partition 6 (could be master)
INFO [org.apache.giraph.master.MasterThread]
org.apache.giraph.master.BspServiceMaster: checkWorkers: Only found 5 responses
of 6 needed to start superstep -1. Reporting every 30000 msecs, 265001 more
msecs left before giving up.
INFO [org.apache.giraph.master.MasterThread]
org.apache.giraph.master.BspServiceMaster: logMissingWorkersOnSuperstep: No
response from partition 6 (could be master)
INFO [org.apache.giraph.master.MasterThread]
org.apache.giraph.master.BspServiceMaster: checkWorkers: Only found 5 responses
of 6 needed to start superstep -1. Reporting every 30000 msecs, 234992 more
msecs left before giving up.
INFO [org.apache.giraph.master.MasterThread]
org.apache.giraph.master.BspServiceMaster: logMissingWorkersOnSuperstep: No
response from partition 6 (could be master)
INFO [org.apache.giraph.master.MasterThread]
org.apache.giraph.master.BspServiceMaster: checkWorkers: Only found 5 responses
of 6 needed to start superstep -1. Reporting every 30000 msecs, 204984 more
msecs left before giving up.
INFO [org.apache.giraph.master.MasterThread]
org.apache.giraph.master.BspServiceMaster: logMissingWorkersOnSuperstep: No
response from partition 6 (could be master)
INFO [org.apache.giraph.master.MasterThread]
org.apache.giraph.master.BspServiceMaster: checkWorkers: Only found 5 responses
of 6 needed to start superstep -1. Reporting every 30000 msecs, 174974 more
msecs left before giving up.
INFO [org.apache.giraph.master.MasterThread]
org.apache.giraph.master.BspServiceMaster: logMissingWorkersOnSuperstep: No
response from partition 6 (could be master)
INFO [org.apache.giraph.master.MasterThread]
org.apache.giraph.master.BspServiceMaster: checkWorkers: Only found 5 responses
of 6 needed to start superstep -1. Reporting every 30000 msecs, 144968 more
msecs left before giving up.
INFO [org.apache.giraph.master.MasterThread]
org.apache.giraph.master.BspServiceMaster: logMissingWorkersOnSuperstep: No
response from partition 6 (could be master)
INFO [org.apache.giraph.master.MasterThread]
org.apache.giraph.master.BspServiceMaster: checkWorkers: Only found 5 responses
of 6 needed to start superstep -1. Reporting every 30000 msecs, 114959 more
msecs left before giving up.
INFO [org.apache.giraph.master.MasterThread]
org.apache.giraph.master.BspServiceMaster: logMissingWorkersOnSuperstep: No
response from partition 6 (could be master)
INFO [org.apache.giraph.master.MasterThread]
org.apache.giraph.master.BspServiceMaster: checkWorkers: Only found 5 responses
of 6 needed to start superstep -1. Reporting every 30000 msecs, 84952 more
msecs left before giving up.
INFO [org.apache.giraph.master.MasterThread]
org.apache.giraph.master.BspServiceMaster: logMissingWorkersOnSuperstep: No
response from partition 6 (could be master)
INFO [org.apache.giraph.master.MasterThread]
org.apache.giraph.master.BspServiceMaster: checkWorkers: Only found 5 responses
of 6 needed to start superstep -1. Reporting every 30000 msecs, 54946 more
msecs left before giving up.
INFO [org.apache.giraph.master.MasterThread]
org.apache.giraph.master.BspServiceMaster: logMissingWorkersOnSuperstep: No
response from partition 6 (could be master)
INFO [org.apache.giraph.master.MasterThread]
org.apache.giraph.master.BspServiceMaster: checkWorkers: Only found 5 responses
of 6 needed to start superstep -1. Reporting every 30000 msecs, 24938 more
msecs left before giving up.
INFO [org.apache.giraph.master.MasterThread]
org.apache.giraph.master.BspServiceMaster: logMissingWorkersOnSuperstep: No
response from partition 6 (could be master)
INFO [org.apache.giraph.master.MasterThread]
org.apache.giraph.master.BspServiceMaster: checkWorkers: Only found 5 responses
of 6 needed to start superstep -1. Reporting every 30000 msecs, -5072 more
msecs left before giving up.
INFO [org.apache.giraph.master.MasterThread]
org.apache.giraph.master.BspServiceMaster: logMissingWorkersOnSuperstep: No
response from partition 6 (could be master)
ERROR [org.apache.giraph.master.MasterThread]
org.apache.giraph.master.BspServiceMaster: checkWorkers: Did not receive enough
processes in time (only 5 of 6 required) after waiting 600000msecs). This
occurs if you do not have enough map tasks available simultaneously on your
Hadoop instance to fulfill the number of requested workers.
INFO [org.apache.giraph.master.MasterThread]
org.apache.giraph.master.BspServiceMaster: setJobState:
{"_stateKey":"FAILED","_applicationAttemptKey":-1,"_superstepKey":-1} on
superstep -1
FATAL [org.apache.giraph.master.MasterThread]
org.apache.giraph.master.BspServiceMaster: failJob: Killing job
job_1364578380737_0027
FATAL [org.apache.giraph.master.MasterThread]
org.apache.giraph.master.BspServiceMaster: failJob: exception
java.lang.IllegalStateException: Not enough healthy workers to create input
splits
WARN [org.apache.giraph.master.MasterThread]
org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final
parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring.
WARN [org.apache.giraph.master.MasterThread]
org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final
parameter: mapreduce.job.end-notification.max.attempts; Ignoring.
INFO [org.apache.giraph.master.MasterThread]
org.apache.hadoop.yarn.service.AbstractService:
Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.
INFO [org.apache.giraph.master.MasterThread]
org.apache.hadoop.yarn.service.AbstractService:
Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.
ERROR [org.apache.giraph.master.MasterThread]
org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
as:ekoontz (auth:SIMPLE) cause:java.io.IOException
ERROR [org.apache.giraph.master.MasterThread]
org.apache.giraph.master.MasterThread: masterThread: Master algorithm failed
with RuntimeException
FATAL [org.apache.giraph.master.MasterThread]
org.apache.giraph.graph.GraphMapper: uncaughtException:
OverrideExceptionHandler on thread org.apache.giraph.master.MasterThread, msg =
java.lang.RuntimeException: java.io.IOException, exiting...
application_1364578380737_0027/container_1364578380737_0027_01_000003/syslog
=========
INFO [main] org.apache.giraph.graph.GraphTaskManager: setup: Log level remains
at info
INFO [main] org.apache.giraph.graph.GraphTaskManager: Distributed cache is
empty. Assuming fatjar.
INFO [main] org.apache.giraph.graph.GraphTaskManager: setup: classpath @
/tmp/hadoop-yarn/staging/ekoontz/.staging/job_1364578380737_0027/job.jar for
job org.apache.giraph.benchmark.PageRankBenchmark
DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker: true
DEBUG [main] org.apache.giraph.graph.GraphTaskManager: taskPartition: 1
DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker is
true.
DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker is
true and zkAlreadyProvided=true.
DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker is
true and taskPartition (1) is NOT less than masterCount (1), so WORKER_ONLY.
INFO [main] org.apache.giraph.graph.GraphTaskManager: setup: Starting up
BspServiceWorker...
INFO [main] org.apache.zookeeper.ZooKeeper: Initiating client connection,
connectString=localhost:2181 sessionTimeout=60000
watcher=org.apache.giraph.worker.BspServiceWorker@5b4bc4e6
INFO [main] org.apache.giraph.comm.netty.NettyServer: NettyServer: Using
execution handler with 8 threads after requestFrameDecoder.
INFO [main] org.apache.giraph.comm.netty.NettyServer: start: Started server
communication server: Eugenes-MacBook-Pro.local/172.16.175.1:30001 with up to
16 threads on bind attempt 0 with sendBufferSize = 32768 receiveBufferSize =
524288 backlog = 6
INFO [main] org.apache.giraph.graph.GraphTaskManager: setup: Registering health
of this worker...
INFO [main] org.apache.giraph.bsp.BspService: getJobState: Job state already
exists (/_hadoopBsp/job_1364578380737_0027/_masterJobState)
DEBUG [main] org.apache.giraph.worker.BspServiceWorker: worker:
Worker(hostname=Eugenes-MacBook-Pro.local, MRtaskID=1, port=30001) with
taskId=1 is starting superstep.
INFO [main] org.apache.giraph.worker.BspServiceWorker: registerHealth: Created
my health node for attempt=0, superstep=-1 with
/_hadoopBsp/job_1364578380737_0027/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir/Eugenes-MacBook-Pro.local_1
and workerInfo= Worker(hostname=Eugenes-MacBook-Pro.local, MRtaskID=1,
port=30001)
INFO [main-EventThread] org.apache.giraph.worker.BspServiceWorker:
processEvent: Job state changed, checking to see if it needs to restart
INFO [main-EventThread] org.apache.giraph.bsp.BspService: getJobState: Job
state already exists (/_hadoopBsp/job_1364578380737_0027/_masterJobState)
application_1364578380737_0027/container_1364578380737_0027_01_000004/syslog
=========
INFO [main] org.apache.giraph.graph.GraphTaskManager: setup: Log level remains
at info
INFO [main] org.apache.giraph.graph.GraphTaskManager: Distributed cache is
empty. Assuming fatjar.
INFO [main] org.apache.giraph.graph.GraphTaskManager: setup: classpath @
/tmp/hadoop-yarn/staging/ekoontz/.staging/job_1364578380737_0027/job.jar for
job org.apache.giraph.benchmark.PageRankBenchmark
DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker: true
DEBUG [main] org.apache.giraph.graph.GraphTaskManager: taskPartition: 2
DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker is
true.
DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker is
true and zkAlreadyProvided=true.
DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker is
true and taskPartition (2) is NOT less than masterCount (1), so WORKER_ONLY.
INFO [main] org.apache.giraph.graph.GraphTaskManager: setup: Starting up
BspServiceWorker...
INFO [main] org.apache.zookeeper.ZooKeeper: Initiating client connection,
connectString=localhost:2181 sessionTimeout=60000
watcher=org.apache.giraph.worker.BspServiceWorker@7aec8784
INFO [main] org.apache.giraph.comm.netty.NettyServer: NettyServer: Using
execution handler with 8 threads after requestFrameDecoder.
INFO [main] org.apache.giraph.comm.netty.NettyServer: start: Started server
communication server: Eugenes-MacBook-Pro.local/172.16.175.1:30002 with up to
16 threads on bind attempt 0 with sendBufferSize = 32768 receiveBufferSize =
524288 backlog = 6
INFO [main] org.apache.giraph.graph.GraphTaskManager: setup: Registering health
of this worker...
INFO [main] org.apache.giraph.bsp.BspService: getJobState: Job state already
exists (/_hadoopBsp/job_1364578380737_0027/_masterJobState)
DEBUG [main] org.apache.giraph.worker.BspServiceWorker: worker:
Worker(hostname=Eugenes-MacBook-Pro.local, MRtaskID=2, port=30002) with
taskId=2 is starting superstep.
INFO [main] org.apache.giraph.worker.BspServiceWorker: registerHealth: Created
my health node for attempt=0, superstep=-1 with
/_hadoopBsp/job_1364578380737_0027/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir/Eugenes-MacBook-Pro.local_2
and workerInfo= Worker(hostname=Eugenes-MacBook-Pro.local, MRtaskID=2,
port=30002)
INFO [main-EventThread] org.apache.giraph.worker.BspServiceWorker:
processEvent: Job state changed, checking to see if it needs to restart
INFO [main-EventThread] org.apache.giraph.bsp.BspService: getJobState: Job
state already exists (/_hadoopBsp/job_1364578380737_0027/_masterJobState)
application_1364578380737_0027/container_1364578380737_0027_01_000005/syslog
=========
INFO [main] org.apache.giraph.graph.GraphTaskManager: setup: Log level remains
at info
INFO [main] org.apache.giraph.graph.GraphTaskManager: Distributed cache is
empty. Assuming fatjar.
INFO [main] org.apache.giraph.graph.GraphTaskManager: setup: classpath @
/tmp/hadoop-yarn/staging/ekoontz/.staging/job_1364578380737_0027/job.jar for
job org.apache.giraph.benchmark.PageRankBenchmark
DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker: true
DEBUG [main] org.apache.giraph.graph.GraphTaskManager: taskPartition: 3
DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker is
true.
DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker is
true and zkAlreadyProvided=true.
DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker is
true and taskPartition (3) is NOT less than masterCount (1), so WORKER_ONLY.
INFO [main] org.apache.giraph.graph.GraphTaskManager: setup: Starting up
BspServiceWorker...
INFO [main] org.apache.zookeeper.ZooKeeper: Initiating client connection,
connectString=localhost:2181 sessionTimeout=60000
watcher=org.apache.giraph.worker.BspServiceWorker@c45aa2c
INFO [main] org.apache.giraph.comm.netty.NettyServer: NettyServer: Using
execution handler with 8 threads after requestFrameDecoder.
INFO [main] org.apache.giraph.comm.netty.NettyServer: start: Started server
communication server: Eugenes-MacBook-Pro.local/172.16.175.1:30003 with up to
16 threads on bind attempt 0 with sendBufferSize = 32768 receiveBufferSize =
524288 backlog = 6
INFO [main] org.apache.giraph.graph.GraphTaskManager: setup: Registering health
of this worker...
INFO [main] org.apache.giraph.bsp.BspService: getJobState: Job state already
exists (/_hadoopBsp/job_1364578380737_0027/_masterJobState)
DEBUG [main] org.apache.giraph.worker.BspServiceWorker: worker:
Worker(hostname=Eugenes-MacBook-Pro.local, MRtaskID=3, port=30003) with
taskId=3 is starting superstep.
INFO [main] org.apache.giraph.worker.BspServiceWorker: registerHealth: Created
my health node for attempt=0, superstep=-1 with
/_hadoopBsp/job_1364578380737_0027/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir/Eugenes-MacBook-Pro.local_3
and workerInfo= Worker(hostname=Eugenes-MacBook-Pro.local, MRtaskID=3,
port=30003)
INFO [main-EventThread] org.apache.giraph.worker.BspServiceWorker:
processEvent: Job state changed, checking to see if it needs to restart
INFO [main-EventThread] org.apache.giraph.bsp.BspService: getJobState: Job
state already exists (/_hadoopBsp/job_1364578380737_0027/_masterJobState)
application_1364578380737_0027/container_1364578380737_0027_01_000006/syslog
=========
INFO [main] org.apache.giraph.graph.GraphTaskManager: setup: Log level remains
at info
INFO [main] org.apache.giraph.graph.GraphTaskManager: Distributed cache is
empty. Assuming fatjar.
INFO [main] org.apache.giraph.graph.GraphTaskManager: setup: classpath @
/tmp/hadoop-yarn/staging/ekoontz/.staging/job_1364578380737_0027/job.jar for
job org.apache.giraph.benchmark.PageRankBenchmark
DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker: true
DEBUG [main] org.apache.giraph.graph.GraphTaskManager: taskPartition: 4
DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker is
true.
DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker is
true and zkAlreadyProvided=true.
DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker is
true and taskPartition (4) is NOT less than masterCount (1), so WORKER_ONLY.
INFO [main] org.apache.giraph.graph.GraphTaskManager: setup: Starting up
BspServiceWorker...
INFO [main] org.apache.zookeeper.ZooKeeper: Initiating client connection,
connectString=localhost:2181 sessionTimeout=60000
watcher=org.apache.giraph.worker.BspServiceWorker@7e413fc6
INFO [main] org.apache.giraph.comm.netty.NettyServer: NettyServer: Using
execution handler with 8 threads after requestFrameDecoder.
INFO [main] org.apache.giraph.comm.netty.NettyServer: start: Started server
communication server: Eugenes-MacBook-Pro.local/172.16.175.1:30004 with up to
16 threads on bind attempt 0 with sendBufferSize = 32768 receiveBufferSize =
524288 backlog = 6
INFO [main] org.apache.giraph.graph.GraphTaskManager: setup: Registering health
of this worker...
INFO [main] org.apache.giraph.bsp.BspService: getJobState: Job state already
exists (/_hadoopBsp/job_1364578380737_0027/_masterJobState)
DEBUG [main] org.apache.giraph.worker.BspServiceWorker: worker:
Worker(hostname=Eugenes-MacBook-Pro.local, MRtaskID=4, port=30004) with
taskId=4 is starting superstep.
INFO [main] org.apache.giraph.worker.BspServiceWorker: registerHealth: Created
my health node for attempt=0, superstep=-1 with
/_hadoopBsp/job_1364578380737_0027/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir/Eugenes-MacBook-Pro.local_4
and workerInfo= Worker(hostname=Eugenes-MacBook-Pro.local, MRtaskID=4,
port=30004)
INFO [main-EventThread] org.apache.giraph.worker.BspServiceWorker:
processEvent: Job state changed, checking to see if it needs to restart
INFO [main-EventThread] org.apache.giraph.bsp.BspService: getJobState: Job
state already exists (/_hadoopBsp/job_1364578380737_0027/_masterJobState)
application_1364578380737_0027/container_1364578380737_0027_01_000007/syslog
=========
INFO [main] org.apache.giraph.graph.GraphTaskManager: setup: Log level remains
at info
INFO [main] org.apache.giraph.graph.GraphTaskManager: Distributed cache is
empty. Assuming fatjar.
INFO [main] org.apache.giraph.graph.GraphTaskManager: setup: classpath @
/tmp/hadoop-yarn/staging/ekoontz/.staging/job_1364578380737_0027/job.jar for
job org.apache.giraph.benchmark.PageRankBenchmark
DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker: true
DEBUG [main] org.apache.giraph.graph.GraphTaskManager: taskPartition: 5
DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker is
true.
DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker is
true and zkAlreadyProvided=true.
DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker is
true and taskPartition (5) is NOT less than masterCount (1), so WORKER_ONLY.
INFO [main] org.apache.giraph.graph.GraphTaskManager: setup: Starting up
BspServiceWorker...
INFO [main] org.apache.zookeeper.ZooKeeper: Initiating client connection,
connectString=localhost:2181 sessionTimeout=60000
watcher=org.apache.giraph.worker.BspServiceWorker@3b5ad1da
INFO [main] org.apache.giraph.comm.netty.NettyServer: NettyServer: Using
execution handler with 8 threads after requestFrameDecoder.
INFO [main] org.apache.giraph.comm.netty.NettyServer: start: Started server
communication server: Eugenes-MacBook-Pro.local/172.16.175.1:30005 with up to
16 threads on bind attempt 0 with sendBufferSize = 32768 receiveBufferSize =
524288 backlog = 6
INFO [main] org.apache.giraph.graph.GraphTaskManager: setup: Registering health
of this worker...
INFO [main] org.apache.giraph.bsp.BspService: getJobState: Job state already
exists (/_hadoopBsp/job_1364578380737_0027/_masterJobState)
DEBUG [main] org.apache.giraph.worker.BspServiceWorker: worker:
Worker(hostname=Eugenes-MacBook-Pro.local, MRtaskID=5, port=30005) with
taskId=5 is starting superstep.
INFO [main] org.apache.giraph.worker.BspServiceWorker: registerHealth: Created
my health node for attempt=0, superstep=-1 with
/_hadoopBsp/job_1364578380737_0027/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir/Eugenes-MacBook-Pro.local_5
and workerInfo= Worker(hostname=Eugenes-MacBook-Pro.local, MRtaskID=5,
port=30005)
INFO [main-EventThread] org.apache.giraph.worker.BspServiceWorker:
processEvent: Job state changed, checking to see if it needs to restart
INFO [main-EventThread] org.apache.giraph.bsp.BspService: getJobState: Job
state already exists (/_hadoopBsp/job_1364578380737_0027/_masterJobState)
application_1364578380737_0027/container_1364578380737_0027_01_000008/syslog
=========
INFO [main] org.apache.giraph.graph.GraphTaskManager: setup: Log level remains
at info
INFO [main] org.apache.giraph.graph.GraphTaskManager: Distributed cache is
empty. Assuming fatjar.
INFO [main] org.apache.giraph.graph.GraphTaskManager: setup: classpath @
/tmp/hadoop-yarn/staging/ekoontz/.staging/job_1364578380737_0027/job.jar for
job org.apache.giraph.benchmark.PageRankBenchmark
DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker: true
DEBUG [main] org.apache.giraph.graph.GraphTaskManager: taskPartition: 6
DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker is
true.
DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker is
true and zkAlreadyProvided=true.
DEBUG [main] org.apache.giraph.graph.GraphTaskManager: splitMasterWorker is
true and taskPartition (6) is NOT less than masterCount (1), so WORKER_ONLY.
INFO [main] org.apache.giraph.graph.GraphTaskManager: setup: Starting up
BspServiceWorker...
INFO [main] org.apache.zookeeper.ZooKeeper: Initiating client connection,
connectString=localhost:2181 sessionTimeout=60000
watcher=org.apache.giraph.worker.BspServiceWorker@1b275a34
INFO [main] org.apache.giraph.comm.netty.NettyServer: NettyServer: Using
execution handler with 8 threads after requestFrameDecoder.
INFO [main] org.apache.giraph.comm.netty.NettyServer: start: Started server
communication server: Eugenes-MacBook-Pro.local/172.16.175.1:30006 with up to
16 threads on bind attempt 0 with sendBufferSize = 32768 receiveBufferSize =
524288 backlog = 6
INFO [main] org.apache.giraph.graph.GraphTaskManager: setup: Registering health
of this worker...
INFO [main] org.apache.giraph.bsp.BspService: getJobState: Job state already
exists (/_hadoopBsp/job_1364578380737_0027/_masterJobState)
DEBUG [main] org.apache.giraph.worker.BspServiceWorker: worker:
Worker(hostname=Eugenes-MacBook-Pro.local, MRtaskID=6, port=30006) with
taskId=6 is starting superstep.
INFO [main] org.apache.giraph.worker.BspServiceWorker: registerHealth: Created
my health node for attempt=0, superstep=-1 with
/_hadoopBsp/job_1364578380737_0027/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir/Eugenes-MacBook-Pro.local_6
and workerInfo= Worker(hostname=Eugenes-MacBook-Pro.local, MRtaskID=6,
port=30006)
{code}
> Exception when running pagerank benchmark: SendVertexRequest cannot be cast
> to MasterRequest
> --------------------------------------------------------------------------------------------
>
> Key: GIRAPH-601
> URL: https://issues.apache.org/jira/browse/GIRAPH-601
> Project: Giraph
> Issue Type: Bug
> Reporter: Eugene Koontz
> Attachments: instrumentation.patch
>
>
> Building Giraph with:
> {code}
> mvn -DskipTests -Phadoop_2.0.3 clean compile
> {code}
> Running pagerank like this:
> {code}
> $HADOOP_RUNTIME/bin/hadoop jar $JAR \
> org.apache.giraph.benchmark.PageRankBenchmark \
> -e 10 -s 10 -v -V 10 -w 6
> {code}
> I see this in
> /tmp/userlogs/application_1364578380737_0003/container_1364578380737_0003_01_000002/
> :
> {code}
> 2013-03-29 10:58:06,371 DEBUG [org.apache.giraph.master.MasterThread]
> org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: Got finished
> worker list = [Eugenes-MacBook-Pro.local_1, Eugenes-MacBook-Pro.local_3],
> size = 2, worker list = [Worker(hostname=Eugenes-MacBook-Pro.local,
> MRtaskID=2, port=30002), Worker(hostname=Eugenes-MacBook-Pro.local,
> MRtaskID=1, port=30001), Worker(hostname=Eugenes-MacBook-Pro.local,
> MRtaskID=4, port=30004), Worker(hostname=Eugenes-MacBook-Pro.local,
> MRtaskID=3, port=30003), Worker(hostname=Eugenes-MacBook-Pro.local,
> MRtaskID=5, port=30005), Worker(hostname=Eugenes-MacBook-Pro.local,
> MRtaskID=0, port=30010)], size = 6 from
> /_hadoopBsp/job_1364578380737_0003/_vertexInputSplitDoneDir
> 2013-03-29 10:58:06,373 WARN [netty-server-exec-3]
> org.apache.giraph.comm.netty.handler.RequestServerHandler: exceptionCaught:
> Channel failed with remote address /172.16.175.1:56236
> java.lang.ClassCastException:
> org.apache.giraph.comm.requests.SendVertexRequest cannot be cast to
> org.apache.giraph.comm.requests.MasterRequest
> at
> org.apache.giraph.comm.netty.handler.MasterRequestServerHandler.processRequest(MasterRequestServerHandler.java:27)
> at
> org.apache.giraph.comm.netty.handler.RequestServerHandler.messageReceived(RequestServerHandler.java:106)
> at
> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
> at
> org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:71)
> at
> org.jboss.netty.handler.execution.ChannelUpstreamEventRunnable.doRun(ChannelUpstreamEventRunnable.java:45)
> at
> org.jboss.netty.handler.execution.ChannelEventRunnable.run(ChannelEventRunnable.java:69)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
> at java.lang.Thread.run(Thread.java:680)
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira