Thanks Derek,

I agree with the maxEigensWritten initialization error and have committed that patch. Now the Display routine itself is failing because it expects a 2-d input point but is getting a 5-d point instead. This may be related to MAHOUT-517.


On 10/12/10 3:16 AM, Derek O'Callaghan wrote:
Hi Jeff, Shannon,

I took a quick look at this just now. It seems that 8 clean eigenvectors are being written by the call to

verifier.runJob(conf, lanczosSeqFiles, L.getRowPath(), verifiedEigensPath, true, 1.0, 0.0, clusters);

in SpectralKMeansDriver.run(). The matrix W is created with numRows = 5 (clusters), and the subsequent transpose() fails with the call to:

SequentialAccessSparseVector outVector = new SequentialAccessSparseVector(tmp);

in TransposeReducer.reduce(), and this is failing as the RandomAccessSparseVector tmp has been created with size newNumCols = 5 (clusters from SpectralKMeansDriver.run()), but it appears to contains the 8 clean eigenvectors, which then generates an IndexException in AbstractVector.set().

Looking back further into EigenVerificationJob.saveCleanEigens(), it looks like it will always write out all of the clean eigenvectors, and ignore the 'maxEigens' value, i.e. the clusters value passed to verifier.runJob() in this case:

for (Map.Entry<MatrixSlice, EigenStatus> pruneSlice : prunedEigenMeta) {
.
.
.
      int numEigensWritten = 0;
      // increment the number of eigenvectors written and see if we've
// reached our specified limit, or if we wish to write all eigenvectors
      // (latter is built-in, since numEigensWritten will always be > 0
      numEigensWritten++;
      if (numEigensWritten == maxEigensToKeep) {
log.info("{} of the {} total eigens have been written", maxEigensToKeep, prunedEigenMeta.size());
        break;
      }

}

I'm assuming the "int numEigensWritten = 0;" should appear before this for loop?

Derek

On 12/10/10 04:21, Jeff Eastman wrote:
 +user@

+1 Any helpers out there want to earn a patch kudo?

On 10/11/10 6:59 PM, Shannon Quinn wrote:
Ok, this machine learning homework assignment is really brutal, due
Wednesday morning...may not get to this before then. Unless anyone would
like to help :)

Shannon

On Mon, Oct 11, 2010 at 10:35 AM, Shannon Quinn<[email protected]> wrote:

I'll have a chance to look at this later today; hopefully I'll have
something for you once you get back tonight.


On Mon, Oct 11, 2010 at 10:33 AM, Jeff Eastman<[email protected]
wrote:
Sorry, my bad. I neglected to commit the TestClusterDumper changes. It's in now and all tests run. The DisplaySpectralKMeans example still fails when you run it but it is not run by any of the build processes. It's pointing to a potential problem in SpectralKMeans which I'd like to get fixed if we can.

I'm starting work at Narus today so I won't be able to pay attention to
this until later this evening.


On 10/10/10 11:48 PM, Sean Owen wrote:

I trust this all is much more bug fix than anything else -- just
mindful of the purported "code freeze" in action now. This leaves us
with a broken build at the moment. I know the point is to get it
sorted straight away. Just wondering if we're pretty sure this isn't
opening up a new line of issues at a time we're going to bless a state
of the code for another 6-8 months.

On Mon, Oct 11, 2010 at 3:35 AM, Jeff Eastman
<[email protected]>   wrote:

  Hi Shannon,

I've committed a new display example that attempts to push the standard mixture of models data set through spectral k-means. After some tweaking
of
configuration arguments it gets remarkably far through, finally failing
on
W.transpose() after the eigen cleanup. I can't imagine this would all be
pilot error so I wonder if you'd have a look at it to see where its
going
south?





Reply via email to