Re: Release tomorrow?

Jeff Eastman Sat, 16 Oct 2010 08:33:32 -0700

I'm able to run both Canopy and LDA on CHD3 after the first parts ofbuild-reuters.sh (thru seq2sparse) have completed. The k-means job failsconsistently in the RandomSeedGenerator. I'm investigating what may bedifferent about its file handling compared to the other jobs.


On 10/14/10 9:36 PM, Jeff Eastman wrote:

Well, in this case k-means fails the same way even after I'veverified the input file so its a hard failure. And the job runs justfine stand-alone on the box, and in both modes on my Mac, so its gotto be something about the Cloudera deployment. Sure would be nice tohave 0.4 run that example on CDH3.
On 10/14/10 8:53 PM, Ted Dunning wrote:
There is often a small delay before files appear in HDFS after they are
created.  This has buggered many a work-flow.
On Thu, Oct 14, 2010 at 8:40 PM, JeffEastman<j...@windwardsolutions.com>wrote:
  On 10/14/10 7:47 PM, Jeff Eastman wrote:
The recent commit to the POM fixed my build problem on my cleanRedHat
box. Currently, build-reuters.sh is failing to run the k-means step on
Hadoop on that box and it looks like it is the same problem we've been
seeing with others running the Cloudera CDH3: hadoop is runningunder adifferent user and the local file references don't resolvecorrectly whenthe job is run under mine. I haven't yet figured out the best wayto fixthis or why the other build-reuters job steps don't have thisproblem (they
all use ./examples... file paths too).
It looks like the RandomSeedGenerator.buildRandom() is somehowseeing anempty input directory when it really has an 11.6 mb part file in it.The
EOFException occurs when executing: SequenceFile.Reader reader = new
SequenceFile.Reader(fs, fileStatus.getPath(), conf); on line 84.There arehdfs and mapred PIDs associated with the hadoop daemons, but whywould thatmatter? The files in hdfs are all under /users/dev/examples... andmy jobs
are running as dev so I don't get why this is happening.

Re: Release tomorrow?

Reply via email to