Actually, as it stands, collect doesn't support labels (either as keys or Named Vectors).
There are 2 considerations: (1) I chose to ignore any use of NamedVectors in DRM since DRM already has row keys, and two different sources have been creating ambiguity of interpretation, so i tailored all the algorithms on Spark to use keys (not named vectors). (2) the reason why row keys are not translated to row bindings is because row keys are not necessarily strings, so translating them would require correct support of toString() transformation on any possible key. (3) I all drm collect cases i ever encountered, i never had any use for keys. Aside from (2) and (3) there's not much reason not to do it as described in (2). -d On Fri, Sep 12, 2014 at 10:36 AM, Andrew Palumbo <[email protected]> wrote: > I'm having some trouble getting the rowLabelBindings from a Sting-keyed > (Chekpointed...Spark)Drm from read in from HDFS. I'm reading in a sequence > file of form <Text,VectorWritable> which is output from seq2sparse. The > Drm has 7598 rows and the vectors seem to be read in properly. When I try > to get a Map using getRowLabelBindings(), I get back a Map of size 1. > > The single key/value pair in that map is consistent with what I would > expect: > > k = /talk.religion.misc/84570, v = 7597 > > (the last row..) I don't know why I'm not getting entries for the rest of > the rows. I've been looking through drmWrap() and drmFromHdfs() and don't > see where/if rowLabelBindings is being set except in collect() (I'm very > likely missing something because of my bad scala understanding). > > Is there a different way to get a map of key/rowIndex? > > Ultimately I'm working from the math-scala package, so I can't do anything > specific with RDDs. > > Below is where I'm hitting trouble shown in the Spark-Shell. > > Any Input is appreciated. Thanks, > > Andy > > > mahout> val drmTFIDF= drmFromHDFS( path = > "/tmp/mahout-work-andy/20news-test-vectors/part-r-00000") > drmTFIDF: org.apache.mahout.math.drm.CheckpointedDrm[_] = > org.apache.mahout.sparkbindings.drm.CheckpointedDrmSpark@6e200e2d > > mahout> drmTFIDF.nrow > res0: Long = 7598 > > ^ > mahout> val drmRowLabelBindings:java.util.HashMap[String,Integer] = new > java.util.HashMap(drmTFIDF.getRowLabelBindings) > drmRowLabelBindings: java.util.HashMap[String,Integer] = > {/talk.religion.misc/84570=7597} > > mahout> val incoreTFIDF=drmTFIDF.collect > incoreTFIDF: org.apache.mahout.math.Matrix = > { > 2770 => > {40894:3.706777572631836,25040:5.326602935791016,63527:7.625180244445801,30072:9.138689994812012,75991:2.7672300338745117,91042:1.964722752571106,85483:5.487469673156738,45764:4.326903343200684,83215:2.904540777206421,25284:4.734808444976807,90958:3.0112483501434326,29565:7.410068988800049,60779:6.3667192459106445,91156:1.6616008281707764,92814:2.255286693572998,23763:4.0394415855407715,7067:12.395035743713379,61058:6.993908405303955,55483:9.745443344116211,43286:3.622220039367676,65462:4.295836925506592,43535:1.5242335796356201,34898:6.624548435211182,66572:8.541470527648926,64323:2.1623659133911133,58008:3.128486394882202,33351:3.3363659381866455,36587:4.08017110824585,74747:2.935668706893921,38val > > mahout> val > incoreRowLabelBindings:incoreRowLabelBindings:java.util.HashMap[String,Integer] > = new java.util.HashMap(incoreTFIDF.getRowLabelBindings) > incoreRowLabelBindings: java.util.HashMap[String,Integer] = > {/talk.religion.misc/84570=7597} > > mahout> incoreTFIDF.nrow > res1: Int = 7598 > > mahout> incoreRowLabelBindings.size > res3: Int = 1 > > mahout> drmTFIDF.nrow > res5: Long = 7598 > > mahout> drmRowLabelBindings.size > res4: Int = 1 >
