Felix, I am not an expert on networking by any means, but BGP is Border Gateway Protocol. It is used to help a router decided the best way to get the packets to where they need to be. If it is wrong then your packets could be taking the long way from one box to another. Have you tried running any networking benchmark tests, even just ping or talking to your hosting company about it? It looks like HDFS is very slow, which is probably because the network is slow. The network can be slow for all kinds of reasons, and your hosting company is probably in the best position to help you debug it.
--Bobby On 7/14/11 3:45 PM, "felix gao" <gre1...@gmail.com> wrote: we didn't do anything on the cluster end, the company hosted our cluster did a BGP update(what ever that means) and full reset. (I think just reboot of the switches) On Thu, Jul 14, 2011 at 1:27 PM, Robert Evans <ev...@yahoo-inc.com> wrote: Felix, So did you change anything except the network configuration? What did you do to fix the "networking issues"? --Bobby Evans On 7/14/11 2:46 PM, "felix gao" <gre1...@gmail.com <http://gre1...@gmail.com> > wrote: recently we had some network issues with our cluster. this job used to take on few minute to complete and how it is taking over half hour. when looking at the jobtracker's log i see it slowly getting all the splits information (the list is not exhaustive) 2011-07-14 14:42:51,434 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201107141056_0005_m_002488 has split on node:/default-rack/x.com <http://x.com> <http://x.com> 2011-07-14 14:42:56,465 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201107141056_0005_m_002489 has split on node:/default-rack/x.com <http://x.com> <http://x.com> 2011-07-14 14:43:01,446 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201107141056_0019_m_000218 has split on node:/default-rack/x.com <http://x.com> <http://x.com> 2011-07-14 14:43:01,466 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201107141056_0010_m_001703 has split on node:/default-rack/x.com <http://x.com> <http://x.com> 2011-07-14 14:43:01,490 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201107141056_0005_m_002489 has split on node:/default-rack/x.com <http://x.com> <http://x.com> 2011-07-14 14:43:06,469 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201107141056_0010_m_001703 has split on node:/default-rack/x.com <http://x.com> <http://x.com> 2011-07-14 14:43:06,473 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201107141056_0019_m_000218 has split on node:/default-rack/x.com <http://x.com> <http://x.com> 2011-07-14 14:43:06,473 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201107141056_0019_m_000219 has split on node:/default-rack/x.com <http://x.com> <http://x.com> 2011-07-14 14:43:06,473 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201107141056_0019_m_000219 has split on node:/default-rack/x.com <http://x.com> <http://x.com> 2011-07-14 14:43:11,500 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201107141056_0019_m_000220 has split on node:/default-rack/x.com <http://x.com> <http://x.com> 2011-07-14 14:43:11,542 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201107141056_0005_m_002491 has split on node:/default-rack/x.com <http://x.com> <http://x.com> 2011-07-14 14:43:16,526 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201107141056_0019_m_000224 has split on node:/default-rack/x.com <http://x.com> <http://x.com> 2011-07-14 14:43:16,526 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201107141056_0019_m_000225 has split on node:/default-rack/x.com <http://x.com> <http://x.com> 2011-07-14 14:43:16,567 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201107141056_0005_m_002491 has split on node:/default-rack/x.com <http://x.com> <http://x.com> 2011-07-14 14:45:26,791 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201107141056_0025_m_000001 has split on node:/default-rack/x.com <http://x.com> <http://x.com> 2011-07-14 14:45:28,696 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201107141056_0005_m_002509 has split on node:/default-rack/x.com <http://x.com> <http://x.com> 2011-07-14 14:45:31,770 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201107141056_0010_m_001722 has split on node:/default-rack/x.com <http://x.com> <http://x.com> 2011-07-14 14:45:31,815 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201107141056_0025_m_000002 has split on node:/default-rack/x.com <http://x.com> <http://x.com> 250 mappers tooks about 25 min to run, 10min spent on generating the tasks. The question is what could have caused this slow down? Thanks, Felix