FYI I moved the numEigensWritten initialisation before the for loop in EigenVerificationJob.saveCleanEigens(), and it gets past the problem. The Frame appears and it's working apart from being unable to render the ellipses, I get the following exception:

Exception in thread "AWT-EventQueue-0" org.apache.mahout.math.CardinalityException: Required cardinality 5 but got 2
    at org.apache.mahout.math.AbstractVector.times(AbstractVector.java:416)
at org.apache.mahout.clustering.display.DisplayClustering.plotEllipse(DisplayClustering.java:192) at org.apache.mahout.clustering.display.DisplayClustering.plotClusters(DisplayClustering.java:124) at org.apache.mahout.clustering.display.DisplaySpectralKMeans.paint(DisplaySpectralKMeans.java:79)

I imagine this is more likely to be a problem in the display code, I might get a chance to look into it later...

On 12/10/10 11:16, Derek O'Callaghan wrote:
Hi Jeff, Shannon,

I took a quick look at this just now. It seems that 8 clean eigenvectors are being written by the call to

verifier.runJob(conf, lanczosSeqFiles, L.getRowPath(), verifiedEigensPath, true, 1.0, 0.0, clusters);

in SpectralKMeansDriver.run(). The matrix W is created with numRows = 5 (clusters), and the subsequent transpose() fails with the call to:

SequentialAccessSparseVector outVector = new SequentialAccessSparseVector(tmp);

in TransposeReducer.reduce(), and this is failing as the RandomAccessSparseVector tmp has been created with size newNumCols = 5 (clusters from SpectralKMeansDriver.run()), but it appears to contains the 8 clean eigenvectors, which then generates an IndexException in AbstractVector.set().

Looking back further into EigenVerificationJob.saveCleanEigens(), it looks like it will always write out all of the clean eigenvectors, and ignore the 'maxEigens' value, i.e. the clusters value passed to verifier.runJob() in this case:

for (Map.Entry<MatrixSlice, EigenStatus> pruneSlice : prunedEigenMeta) {
.
.
.
      int numEigensWritten = 0;
      // increment the number of eigenvectors written and see if we've
// reached our specified limit, or if we wish to write all eigenvectors
      // (latter is built-in, since numEigensWritten will always be > 0
      numEigensWritten++;
      if (numEigensWritten == maxEigensToKeep) {
log.info("{} of the {} total eigens have been written", maxEigensToKeep, prunedEigenMeta.size());
        break;
      }

}

I'm assuming the "int numEigensWritten = 0;" should appear before this for loop?

Derek

On 12/10/10 04:21, Jeff Eastman wrote:
 +user@

+1 Any helpers out there want to earn a patch kudo?

On 10/11/10 6:59 PM, Shannon Quinn wrote:
Ok, this machine learning homework assignment is really brutal, due
Wednesday morning...may not get to this before then. Unless anyone would
like to help :)

Shannon

On Mon, Oct 11, 2010 at 10:35 AM, Shannon Quinn<[email protected]> wrote:

I'll have a chance to look at this later today; hopefully I'll have
something for you once you get back tonight.


On Mon, Oct 11, 2010 at 10:33 AM, Jeff Eastman<[email protected]
wrote:
Sorry, my bad. I neglected to commit the TestClusterDumper changes. It's in now and all tests run. The DisplaySpectralKMeans example still fails when you run it but it is not run by any of the build processes. It's pointing to a potential problem in SpectralKMeans which I'd like to get fixed if we can.

I'm starting work at Narus today so I won't be able to pay attention to
this until later this evening.


On 10/10/10 11:48 PM, Sean Owen wrote:

I trust this all is much more bug fix than anything else -- just
mindful of the purported "code freeze" in action now. This leaves us
with a broken build at the moment. I know the point is to get it
sorted straight away. Just wondering if we're pretty sure this isn't
opening up a new line of issues at a time we're going to bless a state
of the code for another 6-8 months.

On Mon, Oct 11, 2010 at 3:35 AM, Jeff Eastman
<[email protected]>   wrote:

  Hi Shannon,

I've committed a new display example that attempts to push the standard mixture of models data set through spectral k-means. After some tweaking
of
configuration arguments it gets remarkably far through, finally failing
on
W.transpose() after the eigen cleanup. I can't imagine this would all be
pilot error so I wonder if you'd have a look at it to see where its
going
south?




Reply via email to