Re: Starting a Hadoop job outside the cluster

Steve Lewis Tue, 31 May 2011 09:57:59 -0700

My Reducer code says this:
 public static class Reduce extends Reducer<Text, Text, Text, Text> {
        private boolean m_DateSent;


        /**
         * This method is called once for each key. Most applications will
define
         * their reduce class by overriding this method. The default
implementation
         * is an identity function.
         */
        @Override
        protected void reduce(Text key, Iterable<Text> values,
                              Context context)
                throws IOException, InterruptedException {
            if (!m_DateSent) {
                Text dkey = new Text("CreationDate");
                Text dValue = new Text();
                writeKeyValue(context, dkey,dValue,"CreationDate",new
Date().toString());
                writeKeyValue(context,
dkey,dValue,"user.dir",System.getProperty("user.dir"));
                writeKeyValue(context,
dkey,dValue,"os.arch",System.getProperty("os.arch"));
                writeKeyValue(context, dkey,dValue,"os.name
",System.getProperty("os.name"));



//                dkey.set("ip");
//                java.net.InetAddress addr =
java.net.InetAddress.getLocalHost();
//                dValue.set(System.getProperty(addr.toString()));
//                context.write(dkey, dValue);

                m_DateSent = true;
            }
            Iterator<Text> itr = values.iterator();
            // Add interesting code here
            while (itr.hasNext()) {
                Text vCheck = itr.next();
                context.write(key, vCheck);
            }

        }


    }

if os.arch is linux I am running on the cluster -
if windows I am running locally

I run this main hoping to run on the cluster with the NameNode and Job
Tracker at glados

   public static void main(String[] args) throws Exception {
        String outFile = "./out";
        Configuration conf = new Configuration();

        // cause output to go to the cluster
        conf.set("fs.default.name", "hdfs://glados:9000/");
        conf.set("mapreduce.jobtracker.address", "glados:9000/");
        conf.set("mapred.jar", "NShot.jar");

       conf.set("fs.defaultFS","hdfs://glados:9000/");


        Job job = new Job(conf, "Generated data");
        conf = job.getConfiguration();
        job.setJarByClass(NShotInputFormat.class);



       .. Other setup code ...

        boolean ans = job.waitForCompletion(true);
        int ret = ans ? 0 : 1;
    }



On Tue, May 31, 2011 at 9:35 AM, Harsh J <[email protected]> wrote:

> Steve,
>
> What do you mean when you say it shows "windows os" and "user.dir"?
> There will be a few properties in the job.xml that may carry client
> machine information but these shouldn't be a hinderance.
>
> Unless a TaskTracker was started on the Windows box (no daemons ought
> to be started on the client machine), no task may run on it.
>
> On Tue, May 31, 2011 at 9:15 PM, Steve Lewis <[email protected]>
> wrote:
> > I have tried what you suggest (well sort of) a goof example would help
> alot
> > -
> > My reducer is set to among other things emit the local os and user.dir -
> > when I try running from
> > my windows box these appear on hdfs but show the windows os and user.dir
> > leading me to believe that the reducer is still running on my windows
> > machine - I will
> > check the values but a working example would be very useful
> >
> > On Sun, May 29, 2011 at 6:19 AM, Ferdy Galema <[email protected]>
> > wrote:
> >>
> >> Would it not also be possible for a Windows machine to submit the job
> >> directly from a Java process? This way you don't need Cygwin / a full
> local
> >> copy of the installation (correct my if I'm wrong). The steps would then
> >> just be:
> >> 1) Create a basic Java project, add minimum required libraries
> >> (Hadoop/logging)
> >> 2) Set the essential properties (at least this would be the jobtracker
> and
> >> the filesystem)
> >> 3) Implement the Tool
> >> 4) Run the process (from either the IDE or stand-alone jar)
> >>
> >> Steps 1-3 could technically be implemented on another machine, if you
> >> choose to compile a stand-alone jar.
> >>
> >> Ferdy.
> >>
> >> On 05/29/2011 04:50 AM, Harsh J wrote:
> >>>
> >>> Keep a local Hadoop installation with a mirror-copy config, and use
> >>> "hadoop jar<jar>" to submit as usual (since the config points to the
> >>> right areas, the jobs go there).
> >>>
> >>> For Windows you'd need Cygwin installed, however.
> >>>
> >>> On Sun, May 29, 2011 at 12:56 AM, Steve Lewis<[email protected]>
> >>>  wrote:
> >>>>
> >>>> When I want to launch a hadoop job I use SCP to execute a command on
> the
> >>>> Name node machine. I an wondering if there is
> >>>> a way to launch a Hadoop job from a machine that is not on the
> cluster.
> >>>> How
> >>>> to do this on a Windows box or a Mac would be
> >>>> of special interest.
> >>>>
> >>>> --
> >>>> Steven M. Lewis PhD
> >>>> 4221 105th Ave NE
> >>>> Kirkland, WA 98033
> >>>> 206-384-1340 (cell)
> >>>> Skype lordjoe_com
> >>>>
> >>>
> >>>
> >
> >
> >
> > --
> > Steven M. Lewis PhD
> > 4221 105th Ave NE
> > Kirkland, WA 98033
> > 206-384-1340 (cell)
> > Skype lordjoe_com
> >
> >
> >
>
>
>
> --
> Harsh J
>



-- 
Steven M. Lewis PhD
4221 105th Ave NE
Kirkland, WA 98033
206-384-1340 (cell)
Skype lordjoe_com

Re: Starting a Hadoop job outside the cluster

Reply via email to