In cooccurrence for the case of B'A the real-world dimensionality of the 
matrices can be compatible even though the data read in from tuples would leave 
some rows or columns blank—no non-zero elements. At least this is what I 
suspect trying to run cooccurrence on the epinions data (ratings_data.txt, 
trust_data.txt)

I get:

    Exception in thread "main" java.lang.AssertionError: assertion failed: 
Incompatible operand geometry
        at scala.Predef$.assert(Predef.scala:179)
        at org.apache.mahout.math.drm.logical.OpAB.<init>(OpAB.scala:29)
        ...

from       
    val drmBtA = drmB.t %*% drmA

Regardless of what is causing this problem there _will_ be cases where the 
auto-calculated dimensions of A and B (calculated from the non-blank rows when 
the DRM is read in from a text file) are not compatible but the data actually 
is. This is the case where the union of all userIDs is greater than the number 
of user IDs in one or both of the DRMs.

To do this correctly for all cases the row IDs for all unique row keys would 
have to be created across all drms for cooccurrence. This implies using a 
single Map for the row space of all drms read in with a single incrementing 
integer for the DRM row key. The length of this Map would be the row dimension 
for all DRMs. After the row dimension is calculated the Map could be thrown 
away since only columns (input items) need to have application specific IDs 
applied at output.

Does this sound like the right way to handle this case? Will the drmB.t %*% 
drmA do the right thing for non-existent rows/columns, which I think means to 
treat a non-existent vector as if it were all 0s. I believe this worked in the 
hadoop version.


Reply via email to