Robin,

Do you remember if this test ran successfully to completion? If not,
I'll submit a JIRA when I've a complete log of a failed run...

Dan

---------- Forwarded message ----------
From: Grant Ingersoll <[email protected]>
Date: 21 June 2012 21:33
Subject: Re: Spectral Kmeans wiki category data test - can you confirm
if you ran it to completion?
To: Dan Brickley <[email protected]>
Cc: Shannon Quinn <[email protected]>


I'd ask on dev@, as Robin was actually the one who ran it.

On Jun 21, 2012, at 3:15 PM, Dan Brickley wrote:

Hi

With the patch https://issues.apache.org/jira/browse/MAHOUT-986 in
0.7, this doesn't die so quickly ... but I'm still not seeing it run
to completion.

Using the template commandline you suggested, 'bin/mahout
spectralkmeans -k 20 -d 4192499 -x 7 -i path/to/csv/file/ -o
your/output/path/

I've seen it fail with -k 20, and -k 10

Unfortunately I was running this in a screen session without proper
logging and I want to double-check everything before reporting so I'm
re-running with -k 10 now and will file a bug if it fails, ... but
meanwhile I wanted to check in with you to see if you'd had a
successful run. I'm testing with the 0.7 distro.

The failure was an IndexException, here's the -k 20 version,

mahout  spectralkmeans -k 20 -d 4192499 -x 7 -i spectral/input/  -o
spectral/output/

12/06/19 19:33:11 INFO lanczos.LanczosSolver: 20 passes through the
corpus so far...
Exception in thread "main" org.apache.mahout.math.IndexException:
Index 20 is outside allowable range of [0,20)
       at org.apache.mahout.math.AbstractMatrix.set(AbstractMatrix.java:479)
       at 
org.apache.mahout.math.decomposer.lanczos.LanczosSolver.solve(LanczosSolver.java:132)
       at 
org.apache.mahout.math.hadoop.decomposer.DistributedLanczosSolver.runJob(DistributedLanczosSolver.java:73)
       at 
org.apache.mahout.clustering.spectral.kmeans.SpectralKMeansDriver.run(SpectralKMeansDriver.java:148)
       at 
org.apache.mahout.clustering.spectral.kmeans.SpectralKMeansDriver.run(SpectralKMeansDriver.java:86)

It's barfing out here,

   // Next step: perform eigen-decomposition using LanczosSolver
   // since some of the eigen-output is spurious and will be eliminated
   // upon verification, we have to aim to overshoot and then discard
   // unnecessary vectors later
   int overshoot = (int) ((double) clusters * OVERSHOOT_MULTIPLIER);
   DistributedLanczosSolver solver = new DistributedLanczosSolver();
   LanczosState state = new LanczosState(L, overshoot,
solver.getInitialVector(L));
   Path lanczosSeqFiles = new Path(outputCalc, "eigenvectors-" +
(System.nanoTime() & 0xFF));
   solver.runJob(conf,
                 state,
                 overshoot,
                 true,
                 lanczosSeqFiles.toString());

With -k 10 I got "12/06/20 20:51:15 INFO lanczos.LanczosSolver: 10
passes through the corpus so far...
Exception in thread "main" org.apache.mahout.math.IndexException:
Index 10 is outside allowable range of [0,10)
       at org.apache.mahout.math.AbstractMatrix.set(AbstractMatrix.java:479)".

...although the logs also showed "12/06/20 20:40:18 INFO
lanczos.LanczosSolver: Finding 20 singular vectors of matrix with
4192499 rows, via Lanczos" which confused me until Shannon reminded me
of the overshoot.

I'm happy to +cc the mailing lists but for starters thought I'd check
to see if the test run had succeeded for you; if so, maybe I've some
local problem.

Dan


--------------------------------------------
Grant Ingersoll
http://www.lucidimagination.com

Reply via email to