I've been sick for a few days, but today I was able to get back to
playing with HadoopStreaming.  I have a few questions and a few
thoughts.

1) Things definitely work a lot better if I specify input to be a file
instead of a directory. :)  However, I consider it a bug that the error
was "Failure" instead of "Input file not found."

2) I think it would be extremely useful if standard error from the
mapper and reducer commands were put on the job details page.  This
would be especially useful for tracking down bugs and errors.

3) How do you set -inputreader?  The documentation seems really unclear
on this.  The default for the rest of Hadoop is to use TextInputFormat
to read input, but HadoopStreaming seems to have different defaults for
input and output?  How would you tell a job to use TextInputFormat for
its input?  When I try adding "-inputreader TextInputFormat" it says the
class isn't found.

4) Is there any chance that the documentation could include some
substantial examples to make it clear to the uninitiated exactly what
HadoopStreaming does in various situations?  This would be especially
useful since it seems to be so different from the rest of Hadoop.

I'm really trying to get this.  It's just going pretty slowly.  Thanks
for everything.

-- 
Andrew McNabb
http://www.mcnabbs.org/andrew/
PGP Fingerprint: 8A17 B57C 6879 1863 DE55  8012 AB4D 6098 8826 6868

Attachment: signature.asc
Description: Digital signature

Reply via email to