Hi Andrew,
I filed:
http://issues.apache.org/jira/browse/HADOOP-619
to address the -input issues. There is work in progress to address
getting job debugging info. I think this will be coming out in the
next release (8?).
http://issues.apache.org/jira/browse/HADOOP-489
I'll let others address the other issues.
e14
On Oct 20, 2006, at 12:09 PM, Andrew McNabb wrote:
I've been sick for a few days, but today I was able to get back to
playing with HadoopStreaming. I have a few questions and a few
thoughts.
1) Things definitely work a lot better if I specify input to be a file
instead of a directory. :) However, I consider it a bug that the
error
was "Failure" instead of "Input file not found."
2) I think it would be extremely useful if standard error from the
mapper and reducer commands were put on the job details page. This
would be especially useful for tracking down bugs and errors.
3) How do you set -inputreader? The documentation seems really
unclear
on this. The default for the rest of Hadoop is to use TextInputFormat
to read input, but HadoopStreaming seems to have different defaults
for
input and output? How would you tell a job to use TextInputFormat for
its input? When I try adding "-inputreader TextInputFormat" it
says the
class isn't found.
4) Is there any chance that the documentation could include some
substantial examples to make it clear to the uninitiated exactly what
HadoopStreaming does in various situations? This would be especially
useful since it seems to be so different from the rest of Hadoop.
I'm really trying to get this. It's just going pretty slowly. Thanks
for everything.
--
Andrew McNabb
http://www.mcnabbs.org/andrew/
PGP Fingerprint: 8A17 B57C 6879 1863 DE55 8012 AB4D 6098 8826 6868