I'm having some trouble getting the rowLabelBindings from a Sting-keyed 
(Chekpointed...Spark)Drm from read in from HDFS.  I'm reading in a sequence 
file of form <Text,VectorWritable> which is output from  seq2sparse.  The Drm 
has 7598 rows and the vectors seem to be read in properly.  When I try to get a 
Map using getRowLabelBindings(), I get back a Map of size 1.

The single key/value pair in that map is consistent with what I would expect:   
      k = /talk.religion.misc/84570, v = 7597 
(the last row..)  I don't know why I'm not getting entries for the rest of the 
rows.  I've been looking through drmWrap() and drmFromHdfs() and don't see 
where/if rowLabelBindings is being set except in collect() (I'm very likely 
missing something because of my bad scala understanding). 

Is there a different way to get a map of key/rowIndex?

Ultimately I'm working from the math-scala package, so I can't do anything 
specific with RDDs.

Below is where I'm hitting trouble shown in the Spark-Shell.

Any Input is appreciated.  Thanks, 


mahout> val drmTFIDF= drmFromHDFS( path = 
drmTFIDF: org.apache.mahout.math.drm.CheckpointedDrm[_] = 
mahout> drmTFIDF.nrow
res0: Long = 7598
mahout> val drmRowLabelBindings:java.util.HashMap[String,Integer] = new 
drmRowLabelBindings: java.util.HashMap[String,Integer] = 

mahout> val incoreTFIDF=drmTFIDF.collect
incoreTFIDF: org.apache.mahout.math.Matrix = 
  2770  =>    

mahout> val 
= new java.util.HashMap(incoreTFIDF.getRowLabelBindings)
incoreRowLabelBindings: java.util.HashMap[String,Integer] = 

mahout> incoreTFIDF.nrow
res1: Int = 7598

mahout> incoreRowLabelBindings.size
res3: Int = 1

mahout> drmTFIDF.nrow
res5: Long = 7598

mahout> drmRowLabelBindings.size
res4: Int = 1

Reply via email to