So I guess you can run the command below ($HADOOP_HOME/bin/hadoop ..) on
a separate machine just while you have the config file which defines the
ip address of the namenode and/or the datanodes?

Thanks!

-----Original Message-----
From: Michael Bieniosek [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, September 11, 2007 3:54 PM
To: [email protected]; Earney, Billy C.
Subject: Re: hadoop client 

There is a java hadoop client you can run with

$HADOOP_HOME/bin/hadoop [--config /path/to/config/dir] fs -help

Supposedly there are also webdav and fuse HDFS implementations, but I
don't
know anything about them.

-Michael

On 9/11/07 1:11 PM, "Earney, Billy C." <[EMAIL PROTECTED]> wrote:

> Greetings!
> 
> I've been reading through the documentation, and there is one piece of
> information I'm not finding (or I missed..).  Lets say you have a
> cluster of machines one being the namenode, and the rest serving as
> datanodes.  Does a client process (a process trying to
> insert/delete/read files) need to be running on the namenode or
> datanodes? (or can it run on another machine?).
> 
> If a client process can run on another machine, can someone give an
> example and the configuration to do such a thing?  I've seen there has
> been some work done on webdav with hadoop, and was wondering if a
> machine not part of cluster could access HDFS with something like
webdav
> (or similar tool)?
> 
> Thanks!
> 
> -----Original Message-----
> From: Tom White [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, September 11, 2007 2:16 PM
> To: [email protected]
> Subject: Re: Accessing S3 with Hadoop?
> 
>> I just updated the page to add a Notes section explaining the issue
>> and referencing the JIRA issue # you mentioned earlier.
> 
> Great - thanks.
> 
>>> Are you able to do 'bin/hadoop-ec2 launch-cluster' then (on your
> workstation)
>>> 
>>> . bin/hadoop-ec2-env.sh
>>> ssh $SSH_OPTS "[EMAIL PROTECTED]" "sed -i -e
>>> \"s/$MASTER_HOST/\$(hostname)/g\"
>>> /usr/local/hadoop-$HADOOP_VERSION/conf/hadoop-site.xml"
>>> 
>>> and then check to see if the master host has been set correctly (to
>>> the internal IP) in the master host's hadoop-site.xml.
>> 
>> Well, no, since my $MASTER_HOST is now just the external DNS name of
>> the first instance started in the reservation, but this is performed
>> as part of my launch-hadoop-cluster script. In any case, that value
is
>> not set to the internal IP, but rather to the hostname portion of the
>> internal DNS name.
> 
> This is a bit of a mystery to me - I'll try to reproduce it in on my
> workstation.
> 
>> 
>> Currently, my MR jobs are failing because the reducers can't copy the
>> map output and I'm thinking it might be because there is some kind of
>> external address getting in there somehow. I see connections to
>> external IPs in netstat -tan (72.* addresses). Any ideas about that?
>> In the hadoop-site.xml's on the slaves, the address is the external
>> DNS name of the master (ec2-*) but that resolves to the internal 10/8
>> address like it should.
>> 
>>> Also, what version of the EC2 tools are you using?
>> 
>> black:~/code/hadoop-0.14.0/src/contrib/ec2> ec2-version
>> 1.2-11797 2007-03-01
>> black:~/code/hadoop-0.14.0/src/contrib/ec2>
> 
> I'm using the same version so that's not it.
> 
>>> Instances are terminated on the basis of their AMI ID since 0.14.0.
>>> See https://issues.apache.org/jira/browse/HADOOP-1504.
>> 
>> I felt this was unsafe as it was, since it looked for a name of an
>> image and then reversed it to the AMI ID. I just hacked it so you
have
>> to put in the AMI ID in hadoop-ec2-env.sh. Also, the script as it is
>> right now doesn't grep for 'running' so may potentially shut down
some
>> instances starting up in another cluster. I may just be paranoid,
>> however ;)
> 
> Checking for 'running' is a good idea. I've relied on version number
> so folks can easily select the version of hadoop they want on the
> cluster. Perhaps the best solution would be to allow an optional
> parameter to the terminate script to specify the AMI ID if you need
> extra certainty (the script already prompts with a list of instances
> to terminate).
> 
> Tom

Reply via email to