I'm able to run both Canopy and LDA on CHD3 after the first parts of
build-reuters.sh (thru seq2sparse) have completed. The k-means job fails
consistently in the RandomSeedGenerator. I'm investigating what may be
different about its file handling compared to the other jobs.
On 10/14/10 9:36 PM, Jeff Eastman wrote:
Well, in this case k-means fails the same way even after I've
verified the input file so its a hard failure. And the job runs just
fine stand-alone on the box, and in both modes on my Mac, so its got
to be something about the Cloudera deployment. Sure would be nice to
have 0.4 run that example on CDH3.
On 10/14/10 8:53 PM, Ted Dunning wrote:
There is often a small delay before files appear in HDFS after they are
created. This has buggered many a work-flow.
On Thu, Oct 14, 2010 at 8:40 PM, Jeff
Eastman<j...@windwardsolutions.com>wrote:
On 10/14/10 7:47 PM, Jeff Eastman wrote:
The recent commit to the POM fixed my build problem on my clean
RedHat
box. Currently, build-reuters.sh is failing to run the k-means step on
Hadoop on that box and it looks like it is the same problem we've been
seeing with others running the Cloudera CDH3: hadoop is running
under a
different user and the local file references don't resolve
correctly when
the job is run under mine. I haven't yet figured out the best way
to fix
this or why the other build-reuters job steps don't have this
problem (they
all use ./examples... file paths too).
It looks like the RandomSeedGenerator.buildRandom() is somehow
seeing an
empty input directory when it really has an 11.6 mb part file in it.
The
EOFException occurs when executing: SequenceFile.Reader reader = new
SequenceFile.Reader(fs, fileStatus.getPath(), conf); on line 84.
There are
hdfs and mapred PIDs associated with the hadoop daemons, but why
would that
matter? The files in hdfs are all under /users/dev/examples... and
my jobs
are running as dev so I don't get why this is happening.