---------- Forwarded message ---------- From: 夏俊鸾 <[email protected]> Date: Mon, Jul 22, 2013 at 6:48 AM Subject: Issues when running Hadoop on Mesos To: [email protected]
Hi Vinod, Sorry for send you email directly to ask you mesos questions, and it seems that mesos mail list [email protected] is not available right now. I have downloaded mesos from trunk branch(I would like to support hadoop-2.0.0-cdh4.1.2) and build mesos(./configure && make && make install) and make hadoop-2.0.0-mr1-cdh4.1.2, it will launch jobtracker and wordcount test application automatically, everything for now seems Ok. Now, I configure the core-site.xml/hdfs-site.xml/mapred-site.xml to run hadoop on mesos cluster and details are as below *========core-site.xml============* *<property>* * * *<name>io.native.lib.available</name>* * * *<value>true</value>* * * *</property>* * * *<property>* *<name>fs.default.name</name>* *<value>hdfs://10.0.2.19:9000</value>* *</property>* *==========mapred-site.xml===========* *<property>* * <name>mapred.job.tracker</name>* * <value>10.0.2.19:54311</value>* * </property>* * <property>* * <name>mapred.jobtracker.taskScheduler</name>* * <value>org.apache.hadoop.mapred.MesosScheduler</value>* * </property>* * <property>* * <name>mapred.mesos.taskScheduler</name>* * <value>org.apache.hadoop.mapred.JobQueueTaskScheduler</value>* * </property>* * <property>* * <name>mapred.mesos.master</name>* * <value>10.0.2.19:5050</value>* * </property>* *#* *# Make sure to uncomment the 'mapred.mesos.executor' property,* *# when running the Hadoop JobTracker on a real Mesos cluster.* *# NOTE: You need to MANUALLY upload the Mesos executor bundle* *# to the location that is set as the value of this property.* * <property>* * <name>mapred.mesos.executor</name>* * <value>hdfs://10.0.2.19:9000/hadoop.tar.gz</value>* * </property>* * * *# The properties below indicate the amount of resources* *# that are allocated to a Hadoop slot (i.e., map/reduce task) by Mesos.* * <property>* * <name>mapred.mesos.slot.cpus</name>* * <value>0.2</value>* * </property>* * <property>* * <name>mapred.mesos.slot.disk</name>* * <!-- The value is in MB. -->* * <value>1024</value>* * </property>* * <property>* * <name>mapred.mesos.slot.mem</name>* * <!-- Note that this is the total memory required for* * JVM overhead (256 MB) and the heap (-Xmx) of the task.* * The value is in MB. -->* * <value>512</value>* * </property>* And then I launch jobtracker(./bin/hadoop jobtracker) and wordcount application manually, but errors happens as following *============word count ==================* *[andrew@sr419 hadoop-2.0.0-mr1-cdh4.1.2]$ ./bin/hadoop jar hadoop-examples-2.0.0-mr1-cdh4.1.2.jar wordcount /user/andrew/tmp out* *SLF4J: Class path contains multiple SLF4J bindings.* *SLF4J: Found binding in [jar:file:/home/andrew/incubator-mesos/hadoop/hadoop-2.0.0-mr1-cdh4.1.2/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] * *SLF4J: Found binding in [jar:file:/home/andrew/incubator-mesos/hadoop/hadoop-2.0.0-mr1-cdh4.1.2/build/ivy/lib/Hadoop/common/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] * *SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.* *13/07/22 20:33:43 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.* *13/07/22 20:33:43 INFO input.FileInputFormat: Total input paths to process : 1* *13/07/22 20:33:43 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable* *13/07/22 20:33:43 INFO mapred.JobClient: Running job: job_201307222033_0002 * *13/07/22 20:33:44 INFO mapred.JobClient: map 0% reduce 0% // word count seems to be pending* * * *============job tracker(it will be TASK_LOST circularly)=================== * *13/07/22 20:33:43 INFO mapred.MesosScheduler: Satisfied map and reduce slots needed.* *13/07/22 20:33:43 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable* *13/07/22 20:33:43 INFO mapred.MesosScheduler: Added job job_201307222033_0002* *13/07/22 20:33:43 INFO mapred.JobTracker: Job job_201307222033_0002 added successfully for user 'andrew' to queue 'default'* *13/07/22 20:33:43 INFO mapred.JobTracker: Initializing job_201307222033_0002* *13/07/22 20:33:43 INFO mapred.JobInProgress: Initializing job_201307222033_0002* *13/07/22 20:33:43 INFO mapred.AuditLogger: USER=andrew IP=10.0.2.19 OPERATION=SUBMIT_JOB TARGET=job_201307222033_0002 RESULT=SUCCESS* *13/07/22 20:33:43 INFO mapred.JobInProgress: jobToken generated and stored with users keys in /tmp/hadoop-andrew/mapred/system/job_201307222033_0002/jobToken* *13/07/22 20:33:43 INFO mapred.JobInProgress: Input size for job job_201307222033_0002 = 4010. Number of splits = 1* *13/07/22 20:33:43 INFO net.NetworkTopology: Adding a new node: /default-rack/sr419* *13/07/22 20:33:43 INFO mapred.JobInProgress: tip:task_201307222033_0002_m_000000 has split on node:/default-rack/sr419* *13/07/22 20:33:43 INFO mapred.JobInProgress: job_201307222033_0002 LOCALITY_WAIT_FACTOR=1.0* *13/07/22 20:33:43 INFO mapred.JobInProgress: Job job_201307222033_0002 initialized successfully with 1 map tasks and 1 reduce tasks.* *13/07/22 20:33:48 INFO mapred.MesosScheduler: JobTracker Status* * Pending Map Tasks: 1* * Pending Reduce Tasks: 1* * Idle Map Slots: 0* * Idle Reduce Slots: 0* * Inactive Map Slots: 0 (launched but no hearbeat yet)* * Inactive Reduce Slots: 0 (launched but no hearbeat yet)* * Needed Map Slots: 1* * Needed Reduce Slots: 1* *13/07/22 20:33:48 INFO mapred.MesosScheduler: Launching task Task_Tracker_0 on http://sr419:31000* *13/07/22 20:33:48 INFO mapred.MesosScheduler: Satisfied map and reduce slots needed.* *13/07/22 20:33:48 INFO mapred.MesosScheduler: Status update of Task_Tracker_0 to TASK_LOST with message Executor terminated* *13/07/22 20:33:48 INFO mapred.MesosScheduler: Removing terminated TaskTracker: http://sr419:31000* *13/07/22 20:33:49 INFO mapred.MesosScheduler: JobTracker Status* * Pending Map Tasks: 1* * Pending Reduce Tasks: 1* * Idle Map Slots: 0* * Idle Reduce Slots: 0* * Inactive Map Slots: 0 (launched but no hearbeat yet)* * Inactive Reduce Slots: 0 (launched but no hearbeat yet)* * Needed Map Slots: 1* * Needed Reduce Slots: 1* *13/07/22 20:33:49 INFO mapred.MesosScheduler: Launching task Task_Tracker_1 on http://sr419:31000* *13/07/22 20:33:49 INFO mapred.MesosScheduler: Satisfied map and reduce slots needed.* *13/07/22 20:33:49 INFO mapred.MesosScheduler: Status update of Task_Tracker_1 to TASK_LOST with message Executor terminated* *13/07/22 20:33:49 INFO mapred.MesosScheduler: Removing terminated TaskTracker: http://sr419:31000* *13/07/22 20:33:50 INFO mapred.MesosScheduler: JobTracker Status* * Pending Map Tasks: 1* * Pending Reduce Tasks: 1* * Idle Map Slots: 0* * Idle Reduce Slots: 0* * Inactive Map Slots: 0 (launched but no hearbeat yet)* * Inactive Reduce Slots: 0 (launched but no hearbeat yet)* * Needed Map Slots: 1* * Needed Reduce Slots: 1* *13/07/22 20:33:50 INFO mapred.MesosScheduler: Launching task Task_Tracker_2 on http://sr419:31000* *13/07/22 20:33:50 INFO mapred.MesosScheduler: Satisfied map and reduce slots needed.* *13/07/22 20:33:50 INFO mapred.MesosScheduler: Status update of Task_Tracker_2 to TASK_LOST with message Executor terminated* *13/07/22 20:33:50 INFO mapred.MesosScheduler: Removing terminated TaskTracker: http://sr419:31000* *13/07/22 20:33:51 INFO mapred.MesosScheduler: JobTracker Status* * * *=============mesos-slave.INFO===================* *Registered with master [email protected]:5050; given slave ID 201307222033-318898186-5050-19972-0 * *I0722 20:33:48.378780 20034 slave.cpp:739] Got assigned task Task_Tracker_0 for framework 201307222033-318898186-5050-19972-0000* *I0722 20:33:48.379360 20034 slave.cpp:837] Launching task Task_Tracker_0 for framework 201307222033-318898186-5050-19972-0000* *I0722 20:33:48.380995 20034 paths.hpp:303] Created executor directory '/var/run/mesos/slaves/201307222033-318898186-5050-19972-0/frameworks/201307222033-318898186-5050-19972-0000/executors/executor_Task_Tracker_0/runs/114ae051-f03a-4728-af0d-6caeb1d3240a' * *I0722 20:33:48.381255 20034 slave.cpp:948] Queuing task 'Task_Tracker_0' for executor executor_Task_Tracker_0 of framework '201307222033-318898186-5050-19972-0000* *I0722 20:33:48.381343 20026 process_isolator.cpp:99] Launching executor_Task_Tracker_0 (cd hadoop-* && ./bin/mesos-executor) in /var/run/mesos/slaves/201307222033-318898186-5050-19972-0/frameworks/201307222033-318898186-5050-19972-0000/executors/executor_Task_Tracker_0/runs/114ae051-f03a-4728-af0d-6caeb1d3240a with resources cpus=1; mem=1280' for framework 201307222033-318898186-5050-19972-0000* *I0722 20:33:48.381484 20015 slave.cpp:511] Successfully attached file '/var/run/mesos/slaves/201307222033-318898186-5050-19972-0/frameworks/201307222033-318898186-5050-19972-0000/executors/executor_Task_Tracker_0/runs/114ae051-f03a-4728-af0d-6caeb1d3240a' * *I0722 20:33:48.382462 20026 process_isolator.cpp:161] Forked executor at 20434* *I0722 20:33:48.479176 20035 process_isolator.cpp:461] Telling slave of terminated executor 'executor_Task_Tracker_0' of framework 201307222033-318898186-5050-19972-0000* *I0722 20:33:48.479310 20015 slave.cpp:2060] Executor 'executor_Task_Tracker_0' of framework 201307222033-318898186-5050-19972-0000 has exited with status 255* *I0722 20:33:48.480988 20015 slave.cpp:1692] Handling status update TASK_LOST (UUID: 61050093-911f-47ad-a7df-bebffd2a753a) for task Task_Tracker_0 of framework 201307222033-318898186-5050-19972-0000 from @ 0.0.0.0:0* *I0722 20:33:48.481205 20025 status_update_manager.cpp:290] Received status update TASK_LOST (UUID: 61050093-911f-47ad-a7df-bebffd2a753a) for task Task_Tracker_0 of framework 201307222033-318898186-5050-19972-0000 with checkpoint=false* *I0722 20:33:48.481266 20025 status_update_manager.cpp:450] Creating StatusUpdate stream for task Task_Tracker_0 of framework 201307222033-318898186-5050-19972-0000* *I0722 20:33:48.481461 20025 status_update_manager.cpp:336] Forwarding status update TASK_LOST (UUID: 61050093-911f-47ad-a7df-bebffd2a753a) for task Task_Tracker_0 of framework 201307222033-318898186-5050-19972-0000 to [email protected]:5050* *I0722 20:33:48.481613 20025 slave.cpp:1809] Sending acknowledgement for status update TASK_LOST (UUID: 61050093-911f-47ad-a7df-bebffd2a753a) for task Task_Tracker_0 of framework 201307222033-318898186-5050-19972-0000 to @ 0.0.0.0:0* *I0722 20:33:48.485322 20030 status_update_manager.cpp:360] Received status update acknowledgement 61050093-911f-47ad-a7df-bebffd2a753a for task Task_Tracker_0 of framework 201307222033-318898186-5050-19972-0000* *I0722 20:33:48.485424 20030 status_update_manager.cpp:481] Cleaning up status update stream for task Task_Tracker_0 of framework 201307222033-318898186-5050-19972-0000* *I0722 20:33:48.479262 20035 process_isolator.cpp:259] Performing killtree operation on 20434* *Failed to stop 20434: No such process* * Children of 20434: { }* *Signaled 20434* *I0722 20:33:48.505930 20035 process_isolator.cpp:287] Asked to update resources for an unknown/killed executor 'executor_Task_Tracker_0' of framework 201307222033-318898186-5050-19972-0000* *===========log in /tmp for 'executor_Task_Tracker_0' is empty==========* I have suffered above issues for several days and cannot resolve it for now. One point that I would like highlight is that I am not sure how to set the property "mapred.mesos.executor"(it must be the name hadoop.tar.gz? template puzzled me), could you help me to analysis above issues. thank you in advance. regards, Andrew
