Re: DataNode Drive failure. (Tails from the front lines)
You probably want this. https://issues.apache.org/jira/browse/HDFS-457 Koji As we can see here the data node shutdown. Should one disk entering a Read Only state really shut down the entire datanode? The datanode happily restarts once the disk was unmounted. just wondering? On 8/7/09 12:11 PM, Edward Capriolo edlinuxg...@gmail.com wrote: I have a hadoop 18.3 cluster. Today I got two nagios alerts. Actually, I was excited by one of them because I never had a way to test check_hpacucli. Worked first time NIIICC. (Borat voice.) --- Notification Type: PROBLEM Service: check_remote_datanode Host: nyhadoopdata10.ops.jointhegrid.com Address: 10.12.9.20 State: CRITICAL Date/Time: Fri Aug 7 18:24:58 GMT 2009 Additional Info: Connection refused -- -- Notification Type: PROBLEM Service: check_hpacucli Host: nyhadoopdata10.ops.jointhegrid.com Address: 10.12.9.20 State: CRITICAL Date/Time: Fri Aug 7 18:23:18 GMT 2009 Additional Info: CRITICAL Smart Array P400 in Unknown Slot OK/OK/- (LD 1: OK [(1I:1:1 OK)] LD 2: OK [(1I:1:2 OK)] LD 3: OK [(1I:1:3 OK)] LD 4: OK [(1I:1:4 OK)] LD 5: OK [(2I:1:5 OK)] LD 6: OK [(2I:1:6 OK)] LD 7: Failed [(2I:1:7 Failed)] LD 8: OK [(2I:1:8 OK)]) /usr/sbin/hpacucli ctrl all show config detail physicaldrive 2I:1:7 Port: 2I Box: 1 Bay: 7 Status: Failed Drive Type: Data Drive Interface Type: SAS Size: 1TB Rotational Speed: 7200 Firmware Revision: HPD1 Serial Number: Model: HP DB1000BABFF PHY Count: 2 PHY Transfer Rate: 3.0GBPS, Unknown The drive was labeled as failed. It was in a READ ONLY state and sections of the drive would produce an IO error while attempting to read. The datanode logged. 2009-08-07 14:20:03,153 WARN org.apache.hadoop.dfs.DataNode: DataNode is shutting down. directory is not writable: /mnt/disk6/dfs/data/current 2009-08-07 14:20:03,244 INFO org.apache.hadoop.dfs.DataBlockScanner: Exiting DataBlockScanner thread. 2009-08-07 14:20:03,430 INFO org.apache.hadoop.dfs.DataNode: writeBlock blk_-2520177395705282298_448274 received exception java.io.IOException: Read-only file system 2009-08-07 14:20:03,430 ERROR org.apache.hadoop.dfs.DataNode: DatanodeRegistration(10.12.9.20:50010, storageID=DS-520493036-10.12.9.20-50010-1239749144772, infoPort=50075, ipcPort=50020):DataXceiver: java.io.IOException: Read-only file system at java.io.UnixFileSystem.createFileExclusively(Native Method) at java.io.File.createNewFile(File.java:883) at org.apache.hadoop.dfs.FSDataset$FSVolume.createTmpFile(FSDataset.java:388) at org.apache.hadoop.dfs.FSDataset$FSVolume.createTmpFile(FSDataset.java:359) at org.apache.hadoop.dfs.FSDataset.createTmpFile(FSDataset.java:1050) at org.apache.hadoop.dfs.FSDataset.writeToBlock(FSDataset.java:983) at org.apache.hadoop.dfs.DataNode$BlockReceiver.init(DataNode.java:2382) at org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:1234) at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:1092) at java.lang.Thread.run(Thread.java:619) 2009-08-07 14:20:32,974 INFO org.apache.hadoop.dfs.DataNode: DatanodeRegistration(10.12.9.20:50010, storageID=DS-520493036-10.12.9.20-50010-1239749144772, infoPort=50075, ipcPort=50020):Finishing Dat aNode in: FSDataset{dirpath='/mnt/disk0/dfs/data/current,/mnt/disk1/dfs/data/current,/mn t/disk2/dfs/data/current,/mnt/disk3/dfs/data/current,/mnt/disk4/dfs/data/curre nt,/mnt/disk5/dfs/data/current,/m nt/disk6/dfs/data/current,/mnt/disk7/dfs/data/current'} 2009-08-07 14:20:32,974 INFO org.mortbay.util.ThreadedServer: Stopping Acceptor ServerSocket[addr=0.0.0.0/0.0.0.0,port=0,localport=50075] 2009-08-07 14:20:32,977 INFO org.mortbay.http.SocketListener: Stopped SocketListener on 0.0.0.0:50075 2009-08-07 14:20:33,155 INFO org.mortbay.util.Container: Stopped HttpContext[/static,/static] 2009-08-07 14:20:33,293 INFO org.mortbay.util.Container: Stopped HttpContext[/logs,/logs] 2009-08-07 14:20:33,293 INFO org.mortbay.util.Container: Stopped org.mortbay.jetty.servlet.webapplicationhand...@7444f787 2009-08-07 14:20:33,432 INFO org.mortbay.util.Container: Stopped WebApplicationContext[/,/] 2009-08-07 14:20:33,432 INFO org.mortbay.util.Container: Stopped org.mortbay.jetty.ser...@b035079 2009-08-07 14:20:33,432 INFO org.apache.hadoop.ipc.Server: Stopping server on 50020 2009-08-07 14:20:33,432 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder 2009-08-07 14:20:33,432 INFO org.apache.hadoop.dfs.DataNode: Waiting for threadgroup to exit, active threads is 0 2009-08-07 14:20:33,432 INFO org.apache.hadoop.ipc.Server: IPC Server
Re: Some tasks fail to report status between the end of the map and the beginning of the merge
but I didn't find a config option that allows ignoring tasks that fail. If 0.18, http://hadoop.apache.org/common/docs/r0.18.3/api/org/apache/hadoop/mapred/Jo bConf.html#setMaxMapTaskFailuresPercent(int) (mapred.max.map.failures.percent) http://hadoop.apache.org/common/docs/r0.18.3/api/org/apache/hadoop/mapred/Jo bConf.html#setMaxReduceTaskFailuresPercent(int) (mapred.max.reduce.failures.percent) If 0.19 or later, you can also try skipping records. Koji On 8/9/09 2:18 AM, Mathias De Maré mathias.dem...@gmail.com wrote: I changed the maximum split size to 3, and now most tasks actually succeed. However, I still have the failure problem with some tasks (with a job I was running yesterday, I got a failure after 1900 tasks). The problem is that these very few failures can bring down the entire job, as they sometimes seem to just keep failing. I looked through the mapred-default.xml, but I didn't find a config option that allows ignoring tasks that fail. Is there a way to do this (it seems like the only alternative I have, since I can't make the failures stop)? Mathias 2009/8/5 Mathias De Maré mathias.dem...@gmail.com On Wed, Aug 5, 2009 at 9:38 AM, Jothi Padmanabhan joth...@yahoo-inc.comwrote: Hi, Could you please try setting this parameter mapred.merge.recordsBeforeProgress to a lower number? See https://issues.apache.org/jira/browse/HADOOP-4714 Cheers Jothi Hm, that bug looks like it's applicable during the merge, but my case is a block right before the merge (but seemingly right after all of the map tasks finish). I tried putting mapred.merge.recordsBeforeProgress to 100, and it didn't make a difference. On Wed, Aug 5, 2009 at 10:32 AM, Amogh Vasekar am...@yahoo-inc.comwrote: 10 mins reminds me of parameter mapred.task.timeout . This is configurable. Or alternatively you might just do a sysout to let tracker know of its existence ( not an ideal solution though ) Thanks, Amogh Well, the map tasks take around 30 minutes to run. Letting the task idle for a large number of minutes after that is a lot of useless time, imho. I tried with 20 minutes now, but I still get timeouts. I don't know if it's useful, but here are the settings of the map tasks at the moment: configuration property namemapred.job.tracker/name valuelocalhost:9001/value /property property nameio.sort.mb/name value3/value descriptionThe total amount of buffer memory to use while sorting files, in megabytes. By default, gives each merge stream 1MB, which should minimize seeks./description /property property namemapred.tasktracker.map.tasks.maximum/name value4/value descriptionThe maximum number of map tasks that will be run simultaneously by a task tracker. /description /property property namemapred.tasktracker.reduce.tasks.maximum/name value4/value descriptionThe maximum number of reduce tasks that will be run simultaneously by a task tracker. /description /property property namemapred.max.split.size/name value100/value /property property namemapred.child.java.opts/name value-Xmx400m/value /property property namemapred.merge.recordsBeforeProgress/name value100/value /property property namemapred.task.timeout/name value120/value /property /configuration Ideally, I would want to get rid of the delay that causes the timeouts, yet also increase the split size somewhat (though I think a larger split size would increase the delay even more?). The map tasks take around 8000-11000 records as input, and can produce up to 1 000 000 records as output (in case this is relevant). Mathias
Re: How to deal with too many fetch failures?
Probably unrelated to your problem, but one extreme case I've seen, a user's job with large gzip inputs (non-splittable), 20 mappers 800 reducers. Each map outputted like 20G. Too many reducers were hitting a single node as soon as a mapper finished. I think we tried something like mapred.reduce.parallel.copies=1 (to reduce number of reducer copier threads) mapred.reduce.slowstart.completed.maps=1.0 (so that reducers would have 20 mappers to pull from, instead of 800 reducers hitting 1 mapper node as soon as it finishes.) Koji On 8/19/09 11:59 PM, Jason Venner jason.had...@gmail.com wrote: The number 1 cause of this is something that causes a connection to get a map output to fail. I have seen: 1) firewall 2) misconfigured ip addresses (ie: the task tracker attempting the fetch received an incorrect ip address when it looked up the name of the tasktracker with the map segment) 3) rare, the http server on the serving tasktracker is overloaded due to insufficient threads or listen backlog, this can happen if the number of fetches per reduce is large and the number of reduces or the number of maps is very large There are probably other cases, this recently happened to me when I had 6000 maps and 20 reducers on a 10 node cluster, which I believe was case 3 above. Since I didn't actually need to reduce ( I got my summary data via counters in the map phase) I never re-tuned the cluster. On Wed, Aug 19, 2009 at 11:25 PM, Ted Dunning ted.dunn...@gmail.com wrote: I think that the problem that I am remembering was due to poor recovery from this problem. The underlying fault is likely due to poor connectivity between your machines. Test that all members of your cluster can access all others on all ports used by hadoop. See here for hints: http://markmail.org/message/lgafou6d434n2dvx On Wed, Aug 19, 2009 at 10:39 PM, yang song hadoop.ini...@gmail.com wrote: Thank you Ted. Update current cluster is a huge work, we don't want to do so. Could you tell me how 0.19.1 causes certain failures in detail? Thanks again. 2009/8/20 Ted Dunning ted.dunn...@gmail.com I think I remember something about 19.1 in which certain failures would cause this. Consider using an updated 19 or moving to 20 as well. On Wed, Aug 19, 2009 at 5:19 AM, yang song hadoop.ini...@gmail.com wrote: I'm sorry, the version is 0.19.1 -- Ted Dunning, CTO DeepDyve
Re: Streaming ignoring stderr output
This doesn't solve your stderr/stdout problem, but you can always set the timeout to be a bigger value if necessary. -Dmapred.task.timeout=__ (in milliseconds) Koji On 10/25/09 12:00 PM, Ryan Rosario uclamath...@gmail.com wrote: I am using a Python script as a mapper for a Hadoop Streaming (hadoop 0.20.0) job, with reducer NONE. My jobs keep getting killed with task failed to respond after 600 seconds. I tried sending a heartbeat every minute to stderr using sys.stderr.write in my mapper, but nothing is being output to stderr either on disk (in logs/userlogs/...) or in the web UI. stdout is not even recorded. This also means I have no way of knowing what my tasks are doing at any given moment except to look at the counts produced in syslog. I got it to work once, but have not had any luck since. Any suggestions of things to look at as to why I am not able to get any output? Help is greatly appreciated. - Ryan
Re: Data node decommission doesn't seem to be working correctly
Hi Scott, You might be hitting two different issues. 1) Decommission not finishing. https://issues.apache.org/jira/browse/HDFS-694 explains decommission never finishing due to open files in 0.20 2) Nodes showing up both in live and dead nodes. I remember Suresh taking a look at this. It was something about same node registered with hostname and IP separately (when datanode is rejumped and started fresh (?)). Cc-ing Suresh. Koji On 5/17/10 9:32 PM, Scott White scottbl...@gmail.com wrote: I followed the steps mentioned here: http://developer.yahoo.com/hadoop/tutorial/module2.html#decommission to decommission a data node. What I see from the namenode is the hostname of the machine that I decommissioned shows up in both the list of dead nodes but also live nodes where its admin status is marked as 'In Service'. It's been twelve hours and there is no sign in the namenode logs that the node has been decommissioned. Any suggestions of what might be the problem and what to try to ensure that this node gets safely taken down? thanks in advance, Scott
Re: How to replace Jetty-6.1.14 with Jetty 7 in Hadoop?
Try moving up to v 6.1.25, which should be more straightforward. FYI, when we tried 6.1.25, we got hit by a deadlock. http://jira.codehaus.org/browse/JETTY-1264 Koji On 1/17/11 3:10 AM, Steve Loughran ste...@apache.org wrote: On 16/01/11 09:41, xiufeng liu wrote: Hi, In my cluster, Hadoop somehow cannot work, and I found that it was due to the Jetty-6.1.14 which is not able to start up. However, Jetty 7 can work in my cluster. Could any body know how to replace Jetty6.1.14 with Jetty7? Thanks afancy The switch to jetty 7 will not be easy, and I wouldn't encourage you to do it unless you want to get into editing the Hadoop source, retesting everything, Try moving up to v 6.1.25, which should be more straightforward. Replace the JAR, QA the cluster with some terasorting.
Re: Too small initial heap problem.
-Xmx1024 This would be 1024 bytes heap. Maybe you want -Xmx1024m ? Koji On 1/27/11 6:04 PM, Jun Young Kim juneng...@gmail.com wrote: Hi, I have 9 cluster (1 master, 8 slaves) to run a hadoop. when I executed my job in a master, I got the following errors. 11/01/28 10:58:01 INFO mapred.JobClient: Running job: job_201101271451_0011 11/01/28 10:58:02 INFO mapred.JobClient: map 0% reduce 0% 11/01/28 10:58:08 INFO mapred.JobClient: Task Id : attempt_201101271451_0011_m_41_0, Status : FAILED java.io.IOException: Task process exit with nonzero status of 1. at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418) 11/01/28 10:58:08 WARN *mapred.JobClient: Error reading task output*http://hatest03.server:50060/tasklog?plaintext=truetaskid=attempt_201101271451_0011_m_41_0filter=stdout 11/01/28 10:58:08 WARN *mapred.JobClient: Error reading task output*http://hatest03.server:50060/tasklog?plaintext=truetaskid=attempt_201101271451_0011_m_41_0filter=stderr after going the hatest03.server, I've checked the directory which is named attempt_201101271451_0011_m_41_0. there is an error msg in a stdout file. Error occurred during initialization of VM Too small initial heap my configuration to use a heap size is property name mapred.child.java.opts /name value -Xmx1024 /value /property and the physical memory size is free -m $ free -m total used free shared buffers cached Mem: 12001 4711 7290 0197 4056 -/+ buffers/cache: 457 11544 Swap: 20470 2047 how can I fix this problem? -- Junyoung Kim (juneng...@gmail.com)
Re: Draining/Decommisioning a tasktracker
Rishi, Using exclude list for TT will not help as Koji has already mentioned It'll help a bit in a sense that no more tasks are assigned to that TaskTracker once excluded. As for TT decommissioning and map outputs handling, opened a Jira for further discussion. https://issues.apache.org/jira/browse/MAPREDUCE-2291 Koji On 1/29/11 5:37 AM, rishi pathak mailmaverick...@gmail.com wrote: HI, Here is a description of what we are trying to achieve(whether it is possible or not is still not cear): We have large computing clusters used majorly for MPI jobs. We use PBS/Torque and Maui for resource allocation and scheduling. At most times utilization is very high except for very small resource pockets of say 16 cores for 2-5 Hrs. We are trying establish feasibility of using these small(but fixed sized) resource pockets for nutch crawls. Our configuration is: # Hadoop 0.20.2 (packaged with nutch) #Lustre parallel filesystem for data storage # No HDFS We have JT running on one of the login nodes at all times. Request for resource (nodes=16, walltime=05 Hrs.) is made using batch system and as a part of job TTs are provisioned. The problem is, when a job expires, user processes are cleaned up and thus TT gets killed. With that, completed and running map/reduce tasks for nutch job are killed and are rescheduled. Solution could be as we see it: 1. As the filesystem is shared( persistent), restart tasks on another TT and make intermediate task data available. i.e. sort of checkpointing. 2. TT draining - based on a speculative time for task completion, TT whose walltime is nearing expiry will go into draining mode.i.e. no new tasks will be scheduled on that TT. For '1', it is very far fetched(we are no Hadoop expert) '2' seems to be a more sensible approach. Using exclude list for TT will not help as Koji has already mentioned We looked into capacity scheduler but did'nt find any pointers. Phil, what version of hadoop have these hooks in scheduler. On Sat, Jan 29, 2011 at 3:34 AM, phil young phil.wills.yo...@gmail.com wrote: There are some hooks available in the schedulers that could be useful also. I think they were expected to be used to allow you to schedule tasks based on load average on the host, but I'd expect you can customize them for your purpose. On Fri, Jan 28, 2011 at 6:46 AM, Harsh J qwertyman...@gmail.com wrote: Moving discussion to the MapReduce-User list: mapreduce-u...@hadoop.apache.org Reply inline: On Fri, Jan 28, 2011 at 2:39 PM, rishi pathak mailmaverick...@gmail.com wrote: Hi, Is there a way to drain a tasktracker. What we require is not to schedule any more map/red tasks onto a tasktracker(mark it offline) but still the running tasks should not be affected. You could simply shut the TT down. MapReduce was designed with faults in mind and thus tasks that are running on a particular TaskTracker can be re-run elsewhere if they failed. Is this not usable in your case? -- Harsh J www.harshj.com http://www.harshj.com
Re: Draining/Decommisioning a tasktracker
Hi Rishi, P.S. - What credentials are required for commenting on an issue in Jira It's open source. I'd say none :) My feature request is for a regular hadoop clusters whereas yours is pretty unique. Not sure if that Jira applies to your need or not. Koji On 1/31/11 9:21 AM, rishi pathak mailmaverick...@gmail.com wrote: Hi Koji, Thanks for opening feature request. Right now for the purpose stated earlier I have upgraded to hadoop to 0.21. , and trying to see if creating individual leaf level queues for every tasktracker and changing the state of it to 'stopped' before the expiry of the walltime. Seems like it will work for now. P.S. - What credentials are required for commentiong on an issue in Jira On Mon, Jan 31, 2011 at 10:22 PM, Koji Noguchi knogu...@yahoo-inc.com wrote: Rishi, Using exclude list for TT will not help as Koji has already mentioned It'll help a bit in a sense that no more tasks are assigned to that TaskTracker once excluded. As for TT decommissioning and map outputs handling, opened a Jira for further discussion. https://issues.apache.org/jira/browse/MAPREDUCE-2291 Koji On 1/29/11 5:37 AM, rishi pathak mailmaverick...@gmail.com http://mailmaverick...@gmail.com wrote: HI, Here is a description of what we are trying to achieve(whether it is possible or not is still not cear): We have large computing clusters used majorly for MPI jobs. We use PBS/Torque and Maui for resource allocation and scheduling. At most times utilization is very high except for very small resource pockets of say 16 cores for 2-5 Hrs. We are trying establish feasibility of using these small(but fixed sized) resource pockets for nutch crawls. Our configuration is: # Hadoop 0.20.2 (packaged with nutch) #Lustre parallel filesystem for data storage # No HDFS We have JT running on one of the login nodes at all times. Request for resource (nodes=16, walltime=05 Hrs.) is made using batch system and as a part of job TTs are provisioned. The problem is, when a job expires, user processes are cleaned up and thus TT gets killed. With that, completed and running map/reduce tasks for nutch job are killed and are rescheduled. Solution could be as we see it: 1. As the filesystem is shared( persistent), restart tasks on another TT and make intermediate task data available. i.e. sort of checkpointing. 2. TT draining - based on a speculative time for task completion, TT whose walltime is nearing expiry will go into draining mode.i.e. no new tasks will be scheduled on that TT. For '1', it is very far fetched(we are no Hadoop expert) '2' seems to be a more sensible approach. Using exclude list for TT will not help as Koji has already mentioned We looked into capacity scheduler but did'nt find any pointers. Phil, what version of hadoop have these hooks in scheduler. On Sat, Jan 29, 2011 at 3:34 AM, phil young phil.wills.yo...@gmail.com http://phil.wills.yo...@gmail.com wrote: There are some hooks available in the schedulers that could be useful also. I think they were expected to be used to allow you to schedule tasks based on load average on the host, but I'd expect you can customize them for your purpose. On Fri, Jan 28, 2011 at 6:46 AM, Harsh J qwertyman...@gmail.com http://qwertyman...@gmail.com wrote: Moving discussion to the MapReduce-User list: mapreduce-u...@hadoop.apache.org http://mapreduce-u...@hadoop.apache.org Reply inline: On Fri, Jan 28, 2011 at 2:39 PM, rishi pathak mailmaverick...@gmail.com http://mailmaverick...@gmail.com wrote: Hi, Is there a way to drain a tasktracker. What we require is not to schedule any more map/red tasks onto a tasktracker(mark it offline) but still the running tasks should not be affected. You could simply shut the TT down. MapReduce was designed with faults in mind and thus tasks that are running on a particular TaskTracker can be re-run elsewhere if they failed. Is this not usable in your case? -- Harsh J www.harshj.com http://www.harshj.com http://www.harshj.com
Re: Problem with running the job, no default queue
Hi Shivani, You probably don't want to ask m45 specific questions on hadoop.apache mailing list. Try % hadoop queue -showacls It should show which queues you're allowed to submit. If it doesn't give you any queues, you need to request one. Koji On 2/9/11 9:10 PM, Shivani Rao sg...@purdue.edu wrote: Tried a simple example job with Yahoo M45. The job fails for non-existence of a default queue. Output is attached as below. From the Apache hadoop mailing list, found this post (specific to M45), that attacked this problem by setting the property Dmapred.job.queue.name=*myqueue* (http://web.archiveorange.com/archive/v/3inw3ySGHmNRR9Bm14Uv) There is also documentation set for capacity schedulers, but I do not have write access to the files in conf directory, so I do not know how I can set the capacity schedulers there. I am also posting this question on the general lists, just in case. $hadoop jar /grid/0/gs/hadoop/current/hadoop-examples.jar pi 10 1 Number of Maps = 10 Samples per Map = 1 Wrote input for Map #0 Wrote input for Map #1 Wrote input for Map #2 Wrote input for Map #3 Wrote input for Map #4 Wrote input for Map #5 Wrote input for Map #6 Wrote input for Map #7 Wrote input for Map #8 Wrote input for Map #9 Starting Job 11/02/10 04:19:22 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 83705 for sgrao 11/02/10 04:19:22 INFO security.TokenCache: Got dt for hdfs://grit-nn1.yahooresearchcluster.com/user/sgrao/.staging/job_201101150035_26053;uri=68.180.138.10:8020;t.service=68.180.138.10:8020 11/02/10 04:19:22 INFO mapred.FileInputFormat: Total input paths to process : 10 11/02/10 04:19:23 INFO mapred.JobClient: Cleaning up the staging area hdfs://grit-nn1.yahooresearchcluster.com/user/sgrao/.staging/job_201101150035_26053 org.apache.hadoop.ipc.RemoteException: java.io.IOException: Queue default does not exist at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3680) at sun.reflect.GeneratedMethodAccessor32.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:523) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1301) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1297) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1062) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1295) at org.apache.hadoop.ipc.Client.call(Client.java:951) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:223) at org.apache.hadoop.mapred.$Proxy6.submitJob(Unknown Source) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:818) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:752) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1062) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:752) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:726) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1156) at org.apache.hadoop.examples.PiEstimator.estimate(PiEstimator.java:297) at org.apache.hadoop.examples.PiEstimator.run(PiEstimator.java:342) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.examples.PiEstimator.main(PiEstimator.java:351) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:64) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Re: HDFS block size v.s. mapred.min.split.size
(mapred.min.split.size can be only set to larger than HDFS block size) I haven't tried this on a new mapreduce API, but -Dmapred.min.split.size=split_size_you_want -Dmapred.map.tasks=1 I think this would let you set a split size smaller than the hdfs block size :) Koji On 2/17/11 2:32 PM, Jim Falgout jim.falg...@pervasive.com wrote: Generally, if you have large files, setting the block size to 128M or larger is helpful. You can do that on a per file basis or set the block size for the whole filesystem. The larger block size cuts down on the number of map tasks required to handle the overall data size. I've experimented with mapred.min.split.size also and have usually found that the larger the split size, the better the overall run time. Of course there is a cut off point, especially with a very large cluster where larger split sizes will hurt overall scalability. On tests I've run on a 10 and 20 node cluster though, setting the split size as high as 1GB has allows the overall Hadoop jobs to run faster, sometimes quite a bit faster. You will lose some locality, but it seems a trade off with the number of files that have to be shuffled for the reduction step. -Original Message- From: Boduo Li [mailto:birdeey...@gmail.com] Sent: Thursday, February 17, 2011 12:01 PM To: common-user@hadoop.apache.org Subject: HDFS block size v.s. mapred.min.split.size Hi, I'm recently benchmarking Hadoop. I know two ways to control the input data size for each map task(): by changing the HDFS block size (have to reload data into HDFS in this case), or by setting mapred.min.split.size. For my benchmarking task, I need to change the input size for a map task frequently. Changing HDFS block size and reloading data is really painful. But using mapred.min.split.size seems to be problematic. I did some simple test to verify if Hadoop has similar performance in the following cases: (1) HDFS block size = 32MB, mapred.min.split.size=64MB (mapred.min.split.size can be only set to larger than HDFS block size) (2) HDFS block size = 64MB, mapred.min.split.size is not set I ran the same job under these settings. Setting (1) takes 1374s to finish. Setting (2) takes 1412s to finish. I do understand that, given smaller HDFS block size, the I/O is more random. But the 50-second difference seems too much for random I/O of input data. Does anyone have any insight of it? Or does anyone know a better way to control the input size of each map task? Thanks.
Tasks failing with OutOfMemory after jvm upgrade to 1.6.0_18 or later
This is technically a java issue but I thought other hadoop users would find it interesting. When we upgraded from old jvm to a newer (32bit) jvm of 1.6.0_21, we started seeing users' tasks having issues at random places 1. Tasks running 2-3 times slower 2. Tasks failing with OutOfMemory 3. Reducer failing with OutOfMemory while pulling map outpus Our first response was to increase the heapsize but what actually was happening was that NewRatio has changed from 8 to 2 after we upgraded jvm to 1.6.0_21. http://forums.oracle.com/forums/thread.jspa?threadID=2167755 http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6887571 So with -Xmx640m, older jvm (1.6.0_17 or before) old generation went up to PS Old Generation capacity = 596574208 (568.9375MB) used = 596574200 (568.9374923706055MB) free = 8 (7.62939453125E-6MB) 99.9865901007% used but newer jvm(1.6.0_21) only went up to PS Old Generation capacity = 447414272 (426.6875MB) used = 447414264 (426.68749237060547MB) free = 8 (7.62939453125E-6MB) 99.9821194797% used causing all sorts of issues. To bring back the old behavior, we needed to add -XX:NewRatio=8 to jvm param. If you have MAPREDUCE-1207 in your version, you can set property namemapreduce.admin.map.child.java.opts/name value-XX:NewRatio=8/value /property property namemapreduce.admin.reduce.child.java.opts/name value-XX:NewRatio=8/value /property Otherwise, you can set it on 'mapred.child.java.opts' and ask users to also set it whenever they overwrite that conf. Maybe this is a known issue to everyone but thought I would share it since we spent so much time on this simple(?) issue :) You can check the NewRatio value by running jmap -histo on the jvm. Koji
Re: setting mapred.map.child.java.opts not working
Hi Harsh, Wasn't MAPREDUCE-478 in 1.0 ? Maybe the Jira is not up to date. Koji On 1/11/12 8:44 PM, Harsh J ha...@cloudera.com wrote: These properties are not available on Apache Hadoop 1.0 (Formerly known as 0.20.x). This was a feature introduced in 0.21 (https://issues.apache.org/jira/browse/MAPREDUCE-478), and is available today on 0.22 and 0.23 line of releases. For 1.0/0.20, use mapred.child.java.opts, that applies to both map and reduce commonly. Would also be helpful if you can tell us what doc guided you to use these property names instead of the proper one, so we can fix it. On Thu, Jan 12, 2012 at 8:44 AM, T Vinod Gupta tvi...@readypulse.com wrote: Hi, Can someone help me asap? when i run my mapred job, it fails with this error - 12/01/12 02:58:36 INFO mapred.JobClient: Task Id : attempt_201112151554_0050_m_71_0, Status : FAILED Error: Java heap space attempt_201112151554_0050_m_71_0: log4j:ERROR Failed to flush writer, attempt_201112151554_0050_m_71_0: java.io.IOException: Stream closed attempt_201112151554_0050_m_71_0: at sun.nio.cs.StreamEncoder.ensureOpen(StreamEncoder.java:44) attempt_201112151554_0050_m_71_0: at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:139) attempt_201112151554_0050_m_71_0: at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:229) attempt_201112151554_0050_m_71_0: at org.apache.log4j.helpers.QuietWriter.flush(QuietWriter.java:58) attempt_201112151554_0050_m_71_0: at org.apache.hadoop.mapred.TaskLogAppender.flush(TaskLogAppender.java:94) attempt_201112151554_0050_m_71_0: at org.apache.hadoop.mapred.TaskLog.syncLogs(TaskLog.java:260) attempt_201112151554_0050_m_71_0: at org.apache.hadoop.mapred.Child$2.run(Child.java:142) so i updated my mapred-site.xml with these settings - property namemapred.map.child.java.opts/name value-Xmx2048M/value /property property namemapred.reduce.child.java.opts/name value-Xmx2048M/value /property also, when i run my jar, i provide - -Dmapred.map.child.java.opts=-Xmx4000m at the end. inspite of this, the task is not getting the max heap size im setting. where did i go wrong? after changing mapred-site.xml, i restarted jobtracker and tasktracker.. is that not good enough? thanks
Re: setting mapred.map.child.java.opts not working
but those new settings have not yet been added to mapred-default.xml. It's intentionally left out. If set in mapred-default.xml, user's mapred.child.java.opts would be ignored since mapred.{map,reduce}.child.java.opts would always win. Koji On 1/11/12 9:34 PM, George Datskos george.dats...@jp.fujitsu.com wrote: Koji, Harsh mapred-478 seems to be in v1, but those new settings have not yet been added to mapred-default.xml. (for backwards compatibility?) George On 2012/01/12 13:50, Koji Noguchi wrote: Hi Harsh, Wasn't MAPREDUCE-478 in 1.0 ? Maybe the Jira is not up to date. Koji On 1/11/12 8:44 PM, Harsh Jha...@cloudera.com wrote: These properties are not available on Apache Hadoop 1.0 (Formerly known as 0.20.x). This was a feature introduced in 0.21 (https://issues.apache.org/jira/browse/MAPREDUCE-478), and is available today on 0.22 and 0.23 line of releases. For 1.0/0.20, use mapred.child.java.opts, that applies to both map and reduce commonly. Would also be helpful if you can tell us what doc guided you to use these property names instead of the proper one, so we can fix it. On Thu, Jan 12, 2012 at 8:44 AM, T Vinod Guptatvi...@readypulse.com wrote: Hi, Can someone help me asap? when i run my mapred job, it fails with this error - 12/01/12 02:58:36 INFO mapred.JobClient: Task Id : attempt_201112151554_0050_m_71_0, Status : FAILED Error: Java heap space attempt_201112151554_0050_m_71_0: log4j:ERROR Failed to flush writer, attempt_201112151554_0050_m_71_0: java.io.IOException: Stream closed attempt_201112151554_0050_m_71_0: at sun.nio.cs.StreamEncoder.ensureOpen(StreamEncoder.java:44) attempt_201112151554_0050_m_71_0: at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:139) attempt_201112151554_0050_m_71_0: at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:229) attempt_201112151554_0050_m_71_0: at org.apache.log4j.helpers.QuietWriter.flush(QuietWriter.java:58) attempt_201112151554_0050_m_71_0: at org.apache.hadoop.mapred.TaskLogAppender.flush(TaskLogAppender.java:94) attempt_201112151554_0050_m_71_0: at org.apache.hadoop.mapred.TaskLog.syncLogs(TaskLog.java:260) attempt_201112151554_0050_m_71_0: at org.apache.hadoop.mapred.Child$2.run(Child.java:142) so i updated my mapred-site.xml with these settings - property namemapred.map.child.java.opts/name value-Xmx2048M/value /property property namemapred.reduce.child.java.opts/name value-Xmx2048M/value /property also, when i run my jar, i provide - -Dmapred.map.child.java.opts=-Xmx4000m at the end. inspite of this, the task is not getting the max heap size im setting. where did i go wrong? after changing mapred-site.xml, i restarted jobtracker and tasktracker.. is that not good enough? thanks
Re: Wrong version of jackson parser while running a hadoop job
Which hadoop version are you using? On 1/13/12 1:59 PM, vvkbtnkr vvkbt...@yahoo.com wrote: I am running a hadoop jar and keep getting this error - java.lang.NoSuchMethodError: org.codehaus.jackson.JsonParser.getValueAsLong() on digging deeper, this is what I can find:- my jar packages version 1.8.5 of jackson-mapper-asl (Jackson json parser) and I can verify that by using mvn dependency:resolve. I see that the proper version is being packaged into my jar. However, initially my jar was packaging version 1.0.1 of jackson-mapper-asl in the jar, thats because my jar depends on hadoop core 1.0.0 and as can be seen here /org/apache/hadoop/hadoop-core/1.0.0/hadoop-core-1.0.0.pom contains the following dependency section: dependency groupIdorg.codehaus.jackson/groupId artifactIdjackson-mapper-asl/artifactId version1.0.1/version /dependency That is hadoop core depends on version 1.0.1 ... Given this, my current guess is that, though my jar which i run as hadoop jar myjar.jar contains the proper version of jackson parser, somehow during the call to hadoop jar , the wrong version , 1.0.1, of the jar is being loaded by hadoop. Does this make sense? I spent a lot of time verifying that the created jar file myjar.jar does indeed have proper version of jackson parser (i checked the size of JsonParser.class files in my jar is exactly the same as version 1.8.5 of jackson-mapper-asl), so something seems to be happening while hadoop is loading the classes, and am thinking it probably loads version 1.0.1 first, as hadoop core depends on it. Any one faced similar issues, any idea where to look ...
Re: Username on Hadoop 20.2
How about playing with 'whoami'? :) % export PATH=.:$PATH % cat whoami #!/bin/sh echo mynewname % whoami mynewname % Koji On 1/16/12 9:02 AM, Eli Finkelshteyn iefin...@gmail.com wrote: Hi Folks, I'm still lost on this. Has no one wanted or needed to connect to a Hadoop cluster from a client machine under a name other than the client's whoami before? Eli On 1/13/12 11:00 AM, Eli Finkelshteyn wrote: I tried this, and it doesn't seem to work. Specifically, the way I tested it was adding: property nameuser.name/name value[my_username]/value /property to core-site.xml. I then tried a test mkdir, and did an ls which showed the new folder had been created by my default whoami client username instead of the new one I had set. Do I need to add it somewhere else, or add something else to the property name? I'm using CDH3 with my Hadoop cluster currently setup with one node in pseudo-distributed mode, in case that helps. Cheers, Eli On 1/12/12 5:39 PM, Joey Echeverria wrote: Set the user.name property in your core-site.xml on your client nodes. -Joey On Thu, Jan 12, 2012 at 3:55 PM, Eli Finkelshteyniefin...@gmail.com wrote: Hi, If I have one username on a hadoop cluster and would like to set myself up to use that same username from every client from which I access the cluster, how can I go about doing that? I found information about setting hadoop.job.ugi, and have tried setting this property variously in hdfs-site.xml, core-site.xml, and mapred-site.xml, but nothing seems to work. All I want is to be able to look like the same user no matter which of my machines I connect to the cluster from. How can I do this? Thanks! Eli
Re: Wrong version of jackson parser while running a hadoop job
Assuming you're using hadoop-1.0, then export HADOOP_USER_CLASSPATH_FIRST=true for submit/client side to use your HADOOP_CLASS_PATH before frameworks' and -Dmapreduce.user.classpath.first=true for tasks side to use your -libjars before frameworks'. Koji On 1/23/12 6:39 AM, John Armstrong john.armstr...@ccri.com wrote: On Fri, 13 Jan 2012 13:59:12 -0800 (PST), vvkbtnkr vvkbt...@yahoo.com wrote: I am running a hadoop jar and keep getting this error - java.lang.NoSuchMethodError: org.codehaus.jackson.JsonParser.getValueAsLong() Nobody seems to have answered this while I was on vacation, so... Okay, here's what I know, having run into a similar problem before. Hadoop, at some point, decided that it wanted to use Jackson for JSON serialization as an alternative method of serialization for its Configuration objects. Of course, this means that Hadoop now depends on Jackson, and in particular some specific version of Jackson. In my own version (CDH3b3 = Hadoop 0.20.2+737), this is 1.5.2 The $HADOOP_HOME/lib directory in each of my tasktracker nodes contains: jackson-core-asl-1.5.2.jar jackson-mapper-asl-1.5.2.jar So? So here's how the Hadoop classloader works: first it loads up anything in $HADOOP_HOME/lib, then it loads anything on your distributed classpath, including the specified job JAR (we're using an überJAR containing all of our dependencies). But, like all classloaders... THE FIRST VERSION OF A LOADED CLASS TAKES PRECEDENCE This is Very Important for Java to work right, and should be the first thing any Java hacker unfortunate enough to have to understand classloaders should learn. In your case, this means that you're trying to use the 1.8.x version of org.codehaus.jackson.JsonParser, but the 1.5.2 version has already been loaded, and that takes precedence. Since JsonParser doesn't have a method called getValueAsLong() in Jackson 1.5.2, you get an error. So, what do you do? You've got a couple of options. If you control your own deployment environment, that's great; you can try upgrading Hadoop's version of Jackson. It's pretty good about backwards-compatibility, so you probably won't have any problems. On the other hand, if you don't control your deployment environment (or you don't want to risk fiddling with it, which I Totally Get), you can try to rewrite your code to use Jackson version 1.5.2; that's what we did, and it's sort of awkward in some of the ways it does things, but I think for your purposes it probably won't be too difficult. I think JsonParser.getLongValue() is the right method to consider. HTH
Re: distributed cache symlink
Should be ./q_map . Koji On 5/29/12 7:38 AM, Alan Miller alan.mil...@synopsys.com wrote: I'm trying to use the DistributedCache but having an issue resolving the symlinks to my files. My Driver class writes some hashmaps to files in the DC like this: Path tPath = new Path(/data/cache/fd, UUID.randomUUID().toString()); os = new ObjectOutputStream(fs.create(tPath)); os.writeObject(myHashMap); os.close(); URI uri = new URI(tPath.toString() + # + q_map); DistributedCache.addCacheFile(uri, config); DistributedCache.createSymlink(config); But what Path() do I need to access to read the symlinks? I tried variations of q_map, work/q_map but neither works. The files are definitely there because I can set a config var to the path and read the files in my reducer. For example, in my Driver class I set a variable via config.set(q_map, tPath.toString()); And then in my Reducer's setup() I do something like Path q_map_path = new Path(config.get(q_map_path)); if (fs.exists(q_map_path)) { HashMapString,String qMap = loadmap(conf,q_map_path); } I tried to resolve the path to the symlinks via ${mapred.local.dir}/work but that doesn't work either. In the STDOUT of my mapper attempt I see: 2012-05-29 03:59:54,369 - INFO [main:TaskRunner@759] - Creating symlink: /tmp/hadoop-mapred/mapred/local/taskTracker/distcache/-3168904771265144450 _-884848596_406879224/varuna010/data/cache/fd/6dc9d5c0-98be-4105-bd59-b344 924dd989 - /tmp/hadoop-mapred/mapred/local/taskTracker/root/jobcache/job_201205250826 _0020/attempt_201205250826_0020_m_00_0/work/q_map Which says it's creating the symlinks, BUT I also see this output: mapred.local.dir: /tmp/hadoop-mapred/mapred/local/taskTracker/root/jobcache/job_201205250826 _0020/attempt_201205250826_0020_m_00_0 job.local.dir: /tmp/hadoop-mapred/mapred/local/taskTracker/root/jobcache/job_201205250826 _0020/work mapred.task.id: attempt_201205250826_0020_m_00_0 Path [work/q_map] does not exist Path [/tmp/hadoop-mapred/mapred/local/taskTracker/root/jobcache/job_20120525082 6_0020/attempt_201205250826_0020_m_00_0/work/q_map] does not exist Which is from this code in my mapper's setup() method: try { System.out.printf(mapred.local.dir: %s\n, conf.get(mapred.local.dir)); System.out.printf( job.local.dir: %s\n, conf.get(job.local.dir)); System.out.printf( mapred.task.id: %s\n, conf.get(mapred.task.id)); fs = FileSystem.get(conf); Path symlink = new Path(work/q_map); Path fullpath = new Path(conf.get(mapred.local.dir) + /work/q_map); System.out.printf(Path [%s] ,symlink.toString()); if (fs.exists(symlink)) { System.out.println(exists); } else { System.out.println(does not exist); } System.out.printf(Path [%s] ,fullpath.toString()); if (fs.exists(fullpath)) { System.out.println(exists); } else { System.out.println(does not exist); } } catch (IOException e1) { e1.printStackTrace(); } Regards, Alan