Bobby, Thanks for the information. It is the resolver that is making it slow. After we put in the ip to host mapping in /etc/hosts file. everything took off like a space shuttle.
Felix On Thu, Jul 14, 2011 at 1:58 PM, Robert Evans <ev...@yahoo-inc.com> wrote: > Felix, > > I am not an expert on networking by any means, but BGP is Border Gateway > Protocol. It is used to help a router decided the best way to get the > packets to where they need to be. If it is wrong then your packets could be > taking the long way from one box to another. Have you tried running any > networking benchmark tests, even just ping or talking to your hosting > company about it? It looks like HDFS is very slow, which is probably > because the network is slow. The network can be slow for all kinds of > reasons, and your hosting company is probably in the best position to help > you debug it. > > --Bobby > > > On 7/14/11 3:45 PM, "felix gao" <gre1...@gmail.com> wrote: > > we didn't do anything on the cluster end, the company hosted our cluster > did a BGP update(what ever that means) and full reset. (I think just reboot > of the switches) > > On Thu, Jul 14, 2011 at 1:27 PM, Robert Evans <ev...@yahoo-inc.com> wrote: > > Felix, > > So did you change anything except the network configuration? What did you > do to fix the “networking issues”? > > --Bobby Evans > > > On 7/14/11 2:46 PM, "felix gao" <gre1...@gmail.com < > http://gre1...@gmail.com> > wrote: > > recently we had some network issues with our cluster. this job used to > take on few minute to complete and how it is taking over half hour. > > when looking at the jobtracker's log i see it slowly getting all the splits > information (the list is not exhaustive) > 2011-07-14 14:42:51,434 INFO org.apache.hadoop.mapred.JobInProgress: > tip:task_201107141056_0005_m_002488 has split on node:/default-rack/x.com< > http://x.com> <http://x.com> > 2011-07-14 14:42:56,465 INFO org.apache.hadoop.mapred.JobInProgress: > tip:task_201107141056_0005_m_002489 has split on node:/default-rack/x.com< > http://x.com> <http://x.com> > 2011-07-14 14:43:01,446 INFO org.apache.hadoop.mapred.JobInProgress: > tip:task_201107141056_0019_m_000218 has split on node:/default-rack/x.com< > http://x.com> <http://x.com> > 2011-07-14 14:43:01,466 INFO org.apache.hadoop.mapred.JobInProgress: > tip:task_201107141056_0010_m_001703 has split on node:/default-rack/x.com< > http://x.com> <http://x.com> > 2011-07-14 14:43:01,490 INFO org.apache.hadoop.mapred.JobInProgress: > tip:task_201107141056_0005_m_002489 has split on node:/default-rack/x.com< > http://x.com> <http://x.com> > 2011-07-14 14:43:06,469 INFO org.apache.hadoop.mapred.JobInProgress: > tip:task_201107141056_0010_m_001703 has split on node:/default-rack/x.com< > http://x.com> <http://x.com> > 2011-07-14 14:43:06,473 INFO org.apache.hadoop.mapred.JobInProgress: > tip:task_201107141056_0019_m_000218 has split on node:/default-rack/x.com< > http://x.com> <http://x.com> > 2011-07-14 14:43:06,473 INFO org.apache.hadoop.mapred.JobInProgress: > tip:task_201107141056_0019_m_000219 has split on node:/default-rack/x.com< > http://x.com> <http://x.com> > 2011-07-14 14:43:06,473 INFO org.apache.hadoop.mapred.JobInProgress: > tip:task_201107141056_0019_m_000219 has split on node:/default-rack/x.com< > http://x.com> <http://x.com> > 2011-07-14 14:43:11,500 INFO org.apache.hadoop.mapred.JobInProgress: > tip:task_201107141056_0019_m_000220 has split on node:/default-rack/x.com< > http://x.com> <http://x.com> > 2011-07-14 14:43:11,542 INFO org.apache.hadoop.mapred.JobInProgress: > tip:task_201107141056_0005_m_002491 has split on node:/default-rack/x.com< > http://x.com> <http://x.com> > 2011-07-14 14:43:16,526 INFO org.apache.hadoop.mapred.JobInProgress: > tip:task_201107141056_0019_m_000224 has split on node:/default-rack/x.com< > http://x.com> <http://x.com> > 2011-07-14 14:43:16,526 INFO org.apache.hadoop.mapred.JobInProgress: > tip:task_201107141056_0019_m_000225 has split on node:/default-rack/x.com< > http://x.com> <http://x.com> > 2011-07-14 14:43:16,567 INFO org.apache.hadoop.mapred.JobInProgress: > tip:task_201107141056_0005_m_002491 has split on node:/default-rack/x.com< > http://x.com> <http://x.com> > 2011-07-14 14:45:26,791 INFO org.apache.hadoop.mapred.JobInProgress: > tip:task_201107141056_0025_m_000001 has split on node:/default-rack/x.com< > http://x.com> <http://x.com> > 2011-07-14 14:45:28,696 INFO org.apache.hadoop.mapred.JobInProgress: > tip:task_201107141056_0005_m_002509 has split on node:/default-rack/x.com< > http://x.com> <http://x.com> > 2011-07-14 14:45:31,770 INFO org.apache.hadoop.mapred.JobInProgress: > tip:task_201107141056_0010_m_001722 has split on node:/default-rack/x.com< > http://x.com> <http://x.com> > 2011-07-14 14:45:31,815 INFO org.apache.hadoop.mapred.JobInProgress: > tip:task_201107141056_0025_m_000002 has split on node:/default-rack/x.com< > http://x.com> <http://x.com> > > > 250 mappers tooks about 25 min to run, 10min spent on generating the > tasks. The question is what could have caused this slow down? > > Thanks, > > Felix > > > >