On Fri, Jul 24, 2009 at 3:05 PM, Ravi Phulari<[email protected]> wrote: > You can submit multiple MR jobs on same cluster . > It's better to submit all jobs either from external machine which can be used > as a gateway to upload data to HDFS and submit MR jobs or from machine where > NN/JT is running . > There is no problem running MR jobs from other nodes in cluster ( datanode/tt > ) but the best practice is to use external machine if available or NN/JT . > > - > Ravi > > On 7/24/09 11:40 AM, "Hrishikesh Agashe" <[email protected]> > wrote: > > Hi, > > If I have one cluster with around 20 machines, can I submit different MR jobs > from different machines in cluster? Are there any precautions to be taken? (I > want to start Nutch crawl as one job and Katta indexing as another job) > > --Hrishi > DISCLAIMER > ========== > This e-mail may contain privileged and confidential information which is the > property of Persistent Systems Ltd. It is intended only for the use of the > individual or entity to which it is addressed. If you are not the intended > recipient, you are not authorized to read, retain, copy, print, distribute or > use this message. If you have received this communication in error, please > notify the sender and delete all copies of this message. Persistent Systems > Ltd. does not accept any liability for virus infected mails. > > > Ravi > -- > >
Ravi, 10 is my favorite number so I submit my jobs from datanode10. Question why do you suggest NN/JT? I feel that a user has the ability to do more collateral damage to a cluster when they have any access to the NN/TT/JT. My fear is someone writing a bash script with an infinite loop that ties up the cpu or accidentally fill up the disk in some way. To answer the original question technically any system with proper network access, hadoop jars, hadoop conf can interact with the cluster.
