I've been sick for a few days, but today I was able to get back to playing with HadoopStreaming. I have a few questions and a few thoughts.
1) Things definitely work a lot better if I specify input to be a file instead of a directory. :) However, I consider it a bug that the error was "Failure" instead of "Input file not found." 2) I think it would be extremely useful if standard error from the mapper and reducer commands were put on the job details page. This would be especially useful for tracking down bugs and errors. 3) How do you set -inputreader? The documentation seems really unclear on this. The default for the rest of Hadoop is to use TextInputFormat to read input, but HadoopStreaming seems to have different defaults for input and output? How would you tell a job to use TextInputFormat for its input? When I try adding "-inputreader TextInputFormat" it says the class isn't found. 4) Is there any chance that the documentation could include some substantial examples to make it clear to the uninitiated exactly what HadoopStreaming does in various situations? This would be especially useful since it seems to be so different from the rest of Hadoop. I'm really trying to get this. It's just going pretty slowly. Thanks for everything. -- Andrew McNabb http://www.mcnabbs.org/andrew/ PGP Fingerprint: 8A17 B57C 6879 1863 DE55 8012 AB4D 6098 8826 6868
signature.asc
Description: Digital signature
