Re: HadoopStreaming

Eric Baldeschwieler Fri, 20 Oct 2006 15:45:49 -0700

Hi Andrew,

I filed:


http://issues.apache.org/jira/browse/HADOOP-619

to address the -input issues. There is work in progress to addressgetting job debugging info. I think this will be coming out in thenext release (8?).


http://issues.apache.org/jira/browse/HADOOP-489

I'll let others address the other issues.

e14

On Oct 20, 2006, at 12:09 PM, Andrew McNabb wrote:

I've been sick for a few days, but today I was able to get back to
playing with HadoopStreaming.  I have a few questions and a few
thoughts.

1) Things definitely work a lot better if I specify input to be a file

instead of a directory. :) However, I consider it a bug that theerror

was "Failure" instead of "Input file not found."

2) I think it would be extremely useful if standard error from the
mapper and reducer commands were put on the job details page.  This
would be especially useful for tracking down bugs and errors.

3) How do you set -inputreader? The documentation seems reallyunclear

on this.  The default for the rest of Hadoop is to use TextInputFormat

to read input, but HadoopStreaming seems to have different defaultsfor

input and output?  How would you tell a job to use TextInputFormat for

its input? When I try adding "-inputreader TextInputFormat" itsays the

class isn't found.

4) Is there any chance that the documentation could include some
substantial examples to make it clear to the uninitiated exactly what
HadoopStreaming does in various situations?  This would be especially
useful since it seems to be so different from the rest of Hadoop.

I'm really trying to get this.  It's just going pretty slowly.  Thanks
for everything.

--
Andrew McNabb
http://www.mcnabbs.org/andrew/
PGP Fingerprint: 8A17 B57C 6879 1863 DE55  8012 AB4D 6098 8826 6868

Re: HadoopStreaming

Reply via email to