Re: Draining/Decommisioning a tasktracker

Koji Noguchi Mon, 31 Jan 2011 09:35:00 -0800

Hi Rishi,

> P.S. - What credentials are required for commenting on an issue in Jira
>
It's open source.  I'd say none :)

My feature request is for a regular hadoop clusters whereas yours is pretty 
unique.
Not sure if that Jira applies to your need or not.

Koji

On 1/31/11 9:21 AM, "rishi pathak" <mailmaverick...@gmail.com> wrote:

Hi Koji,
           Thanks for opening feature request. Right now for the purpose stated 
earlier
I have upgraded to hadoop to 0.21. , and trying to see if creating individual 
leaf level queues for every tasktracker and changing the state of it to 
'stopped' before the expiry of the walltime. Seems like it will work for now.

P.S. - What credentials are required for commentiong on an issue in Jira

On Mon, Jan 31, 2011 at 10:22 PM, Koji Noguchi <knogu...@yahoo-inc.com> wrote:
Rishi,

> Using exclude list for TT will not help as Koji has already mentioned
>
It'll help a bit in a sense that no more tasks are assigned to that TaskTracker 
once excluded.

As for TT decommissioning and map outputs handling, opened a Jira for further 
discussion.
https://issues.apache.org/jira/browse/MAPREDUCE-2291

Koji

On 1/29/11 5:37 AM, "rishi pathak" <mailmaverick...@gmail.com 
<http://mailmaverick...@gmail.com> > wrote:

HI,
    Here is a description of what we are trying to achieve(whether it is 
possible or not is still not cear):
We have large computing clusters used majorly  for MPI jobs. We use PBS/Torque 
and Maui for resource allocation and scheduling.
At most times utilization is very high except for very small resource pockets 
of say 16 cores for 2-5 Hrs. We are trying establish feasibility of using these 
small(but fixed sized) resource pockets for nutch crawls. Our configuration is:

# Hadoop 0.20.2 (packaged with nutch)
#Lustre parallel filesystem for data storage
# No HDFS

We have JT running on one of the login nodes at all times.
Request for resource (nodes=16, walltime=05 Hrs.) is made using batch system 
and as a part of job TTs are provisioned. The problem is, when a job expires, 
user processes are cleaned up and thus TT gets killed. With that, completed and 
running map/reduce tasks for nutch job are killed and are rescheduled. Solution 
could be as we see it:

1. As the filesystem is shared(& persistent),  restart tasks on another TT and 
make intermediate task data available. i.e. sort of checkpointing.
2. TT draining - based on a speculative time for task completion, TT whose 
walltime is nearing expiry will go into draining mode.i.e. no new tasks will be 
scheduled on that TT.

For '1', it is very far fetched(we are no Hadoop expert)
'2' seems to be a more sensible approach.

Using exclude list for TT will not help as Koji has already mentioned
We looked into capacity scheduler but did'nt find any pointers. Phil, what 
version of hadoop
have these hooks in scheduler.

On Sat, Jan 29, 2011 at 3:34 AM, phil young <phil.wills.yo...@gmail.com 
<http://phil.wills.yo...@gmail.com> > wrote:
There are some hooks available in the schedulers that could be useful also.
I think they were expected to be used to allow you to schedule tasks based
on load average on the host, but I'd expect you can customize them for your
purpose.

On Fri, Jan 28, 2011 at 6:46 AM, Harsh J <qwertyman...@gmail.com 
<http://qwertyman...@gmail.com> > wrote:

> Moving discussion to the MapReduce-User list:
> mapreduce-u...@hadoop.apache.org <http://mapreduce-u...@hadoop.apache.org>
>
> Reply inline:
>
> On Fri, Jan 28, 2011 at 2:39 PM, rishi pathak <mailmaverick...@gmail.com 
> <http://mailmaverick...@gmail.com> >
> wrote:
> > Hi,
> >        Is there a way to drain a tasktracker. What we require is not to
> > schedule any more map/red tasks onto a tasktracker(mark it offline) but
> > still the running tasks should not be affected.
>
> You could simply shut the TT down. MapReduce was designed with faults
> in mind and thus tasks that are running on a particular TaskTracker
> can be re-run elsewhere if they failed. Is this not usable in your
> case?
>
> --
> Harsh J
> www.harshj.com <http://www.harshj.com>  <http://www.harshj.com>
>

Re: Draining/Decommisioning a tasktracker

Reply via email to