Yes, will be immensely helpful for others. Suhail
On Tue, Jan 26, 2010 at 9:52 PM, Jean-Daniel Cryans <[email protected]>wrote: > You mean that documentation? > > http://hadoop.apache.org/common/docs/r0.20.1/quickstart.html#Required+Software > > J-D > > On Tue, Jan 26, 2010 at 1:34 AM, Suhail Rehman <[email protected]> > wrote: > > We finally figured it out! The problem was with the JDK installation on > our > > VMs, it was configured to use IBM JDK, and the moment we switched to Sun, > > everything now works flawlessly. > > > > You may want to include this information somewhere in the documentation > that > > you strongly recommend Sun JDK to be used with Hadoop. > > > > Suhail > > > > On Thu, Jan 21, 2010 at 1:13 PM, Suhail Rehman <[email protected]> > > wrote: > >> > >> We have verified that it does NOT solve the problem at all. This would > >> lead us to believe that the timeout issue we are experiencing is not > part of > >> the shuffle phase. Any other ideas that might help us? > >> > >> The Tasktracker logs show that these reducers are stuck during the copy > >> phase. > >> > >> Suhail > >> > >> On Wed, Jan 20, 2010 at 5:22 PM, Amareshwari Sri Ramadasu > >> <[email protected]> wrote: > >>> > >>> ReadTimeOuts are found to be costly during shuffle, if the map runtime > is > >>> high. > >>> Please see HADOOP-3327( > http://issues.apache.org/jira/browse/HADOOP-3327) > >>> for shuffle improvements done for ReadTimeOut specificlly > >>> > >>> Thanks > >>> Amareshwari > >>> > >>> On 1/20/10 6:07 PM, "Suhail Rehman" <[email protected]> wrote: > >>> > >>> We are having trouble running Hadoop MapReduce jobs on our cluster. > >>> > >>> VMs running on an IBM blade center with the following virtualized > >>> configuration: > >>> > >>> Master Node/Namenode: 1x > >>> OS: Xen RedHat Linux 5.2, CPU : 3 vCPU, RAM: 1024 MB > >>> Slaves/DataNode: 3x > >>> OS: Xen RedHat Linux 5.2 1 vCPU, 1024 MB RAM > >>> > >>> We are working with standard Hadoop example code. We are using Hadoop > >>> 0.20.1, stable with the latest patches installed. All VMs have > firewalls > >>> turned off as well as SELinux disabled. > >>> > >>> For example, while we try to execute the "wordcount" program on a > >>> provisioned cluster, the Map operations complete successfully, the > program > >>> is stuck trying to complete the reduce operations. > >>> > >>> On examining the logs, we find that the Reducers are waiting for the > >>> outputs from Map operations on other nodes. Our understanding is that > this > >>> communication happens over HTTP sockets and all these provisioned VMs > have > >>> trouble communicating over the HTTP sockets on the ports that Hadoop > uses. > >>> > >>> Also, while trying to access the JobTracker web interface to view the > >>> running jobs, we see that the machine is taking too much time to > respond to > >>> our queries. Since both of the Reducer communication and the JobTracker > web > >>> interface works over HTTP, we think the problem might be a networking > issue > >>> or a problem with the built-in HTTP service in Hadoop (Jetty). > >>> > >>> Attached is a partial Task log from one of the Reducers, > >>> "WARN org.apache.hadoop.mapred.ReduceTask: > >>> java.net.SocketTimeoutException: Read timed out" > >>> appears on all reducers, and eventually the Job either fails to > complete > >>> or takes a very long time (about 15 hours to process a 11 GB text > file). > >>> > >>> This problem seems to be random and at times the program runs > sucessfully > >>> in about 20 mins, othertimes it completes the operation in 15 hours. > >>> > >>> Any help with regards to this would be much appreciated. > >>> > >>> Regards, > >>> > >>> Suhail Rehman > >>> MS by Research in Computer Science > >>> International Institute of Information Technology - Hyderabad > >>> [email protected] > >>> --------------------------------------------------------------------- > >>> http://research.iiit.ac.in/~rehman<http://research.iiit.ac.in/%7Erehman> > >>> > >> > >> > >> > >> -- > >> Regards, > >> > >> Suhail Rehman > >> MS by Research in Computer Science > >> International Institute of Information Technology - Hyderabad > >> [email protected] > >> --------------------------------------------------------------------- > >> http://research.iiit.ac.in/~rehman<http://research.iiit.ac.in/%7Erehman> > > > > > > > > -- > > Regards, > > > > Suhail Rehman > > MS by Research in Computer Science > > International Institute of Information Technology - Hyderabad > > [email protected] > > --------------------------------------------------------------------- > > http://research.iiit.ac.in/~rehman<http://research.iiit.ac.in/%7Erehman> > > > -- Regards, Suhail Rehman MS by Research in Computer Science International Institute of Information Technology - Hyderabad [email protected] --------------------------------------------------------------------- http://research.iiit.ac.in/~rehman
