I didnt realize that that is what you were referring to earlier, Anand. I was looking at that too. I tried changing it around a bit, but like I said, my scala sucks.
> Date: Fri, 12 Sep 2014 12:00:47 -0700 > Subject: Re: drmFromHDFS rowLabelBindings question > From: [email protected] > To: [email protected] > > On Fri, Sep 12, 2014 at 11:57 AM, Dmitriy Lyubimov <[email protected]> > wrote: > > > bit i you are really compelled that it is something that might be needed, > > the best way probably would be indeed create an optional parameter to > > collect (something like drmLike.collect(extractLabels:Boolean=false)) which > > you can flip to true if needed and the thing does toString on keys and > > assinging them to in-core matrix' row labels. (requires a patch of course) > > > > > As I mentioned in the other mail, this is already the case. The code seems > to assume .toMap internally does collect. My (somewhat wild) suspicion is > that this line is somehow fooling the eye: > > val rowBindings = d.map(t => (t._1._1.toString, t._2: > java.lang.Integer)).toMap > > > Thanks > > > On Fri, Sep 12, 2014 at 11:55 AM, Dmitriy Lyubimov <[email protected]> > > wrote: > > > > > if you bail out into pure Spark out of algebraic DSL, yes. > > > > > > something like drmA.rdd.map(_._1).collect > > > > > > On Fri, Sep 12, 2014 at 11:46 AM, Andrew Palumbo <[email protected]> > > > wrote: > > > > > >> Ok thanks- All that I need is a Vector of the String keys of the Drm > > >> (they contain the category labels that I need)- I think i was just going > > >> the long (wrong) way to get them. Is there an easy way to extract > > these? > > >> > > >> > > >> > Date: Fri, 12 Sep 2014 11:30:37 -0700 > > >> > Subject: Re: drmFromHDFS rowLabelBindings question > > >> > From: [email protected] > > >> > To: [email protected] > > >> > > > >> > Actually, as it stands, collect doesn't support labels (either as keys > > >> or > > >> > Named Vectors). > > >> > > > >> > There are 2 considerations: > > >> > (1) I chose to ignore any use of NamedVectors in DRM since DRM already > > >> has > > >> > row keys, and two different sources have been creating ambiguity of > > >> > interpretation, so i tailored all the algorithms on Spark to use keys > > >> (not > > >> > named vectors). > > >> > > > >> > (2) the reason why row keys are not translated to row bindings is > > >> because > > >> > row keys are not necessarily strings, so translating them would > > require > > >> > correct support of toString() transformation on any possible key. > > >> > > > >> > (3) I all drm collect cases i ever encountered, i never had any use > > for > > >> > keys. > > >> > > > >> > Aside from (2) and (3) there's not much reason not to do it as > > >> described in > > >> > (2). > > >> > > > >> > -d > > >> > > > >> > > > >> > On Fri, Sep 12, 2014 at 10:36 AM, Andrew Palumbo <[email protected]> > > >> wrote: > > >> > > > >> > > I'm having some trouble getting the rowLabelBindings from a > > >> Sting-keyed > > >> > > (Chekpointed...Spark)Drm from read in from HDFS. I'm reading in a > > >> sequence > > >> > > file of form <Text,VectorWritable> which is output from seq2sparse. > > >> The > > >> > > Drm has 7598 rows and the vectors seem to be read in properly. When > > >> I try > > >> > > to get a Map using getRowLabelBindings(), I get back a Map of size > > 1. > > >> > > > > >> > > The single key/value pair in that map is consistent with what I > > would > > >> > > expect: > > >> > > > > >> > > k = /talk.religion.misc/84570, v = 7597 > > >> > > > > >> > > (the last row..) I don't know why I'm not getting entries for the > > >> rest of > > >> > > the rows. I've been looking through drmWrap() and drmFromHdfs() and > > >> don't > > >> > > see where/if rowLabelBindings is being set except in collect() (I'm > > >> very > > >> > > likely missing something because of my bad scala understanding). > > >> > > > > >> > > Is there a different way to get a map of key/rowIndex? > > >> > > > > >> > > Ultimately I'm working from the math-scala package, so I can't do > > >> anything > > >> > > specific with RDDs. > > >> > > > > >> > > Below is where I'm hitting trouble shown in the Spark-Shell. > > >> > > > > >> > > Any Input is appreciated. Thanks, > > >> > > > > >> > > Andy > > >> > > > > >> > > > > >> > > mahout> val drmTFIDF= drmFromHDFS( path = > > >> > > "/tmp/mahout-work-andy/20news-test-vectors/part-r-00000") > > >> > > drmTFIDF: org.apache.mahout.math.drm.CheckpointedDrm[_] = > > >> > > org.apache.mahout.sparkbindings.drm.CheckpointedDrmSpark@6e200e2d > > >> > > > > >> > > mahout> drmTFIDF.nrow > > >> > > res0: Long = 7598 > > >> > > > > >> > > ^ > > >> > > mahout> val drmRowLabelBindings:java.util.HashMap[String,Integer] = > > >> new > > >> > > java.util.HashMap(drmTFIDF.getRowLabelBindings) > > >> > > drmRowLabelBindings: java.util.HashMap[String,Integer] = > > >> > > {/talk.religion.misc/84570=7597} > > >> > > > > >> > > mahout> val incoreTFIDF=drmTFIDF.collect > > >> > > incoreTFIDF: org.apache.mahout.math.Matrix = > > >> > > { > > >> > > 2770 => > > >> > > > > >> > > {40894:3.706777572631836,25040:5.326602935791016,63527:7.625180244445801,30072:9.138689994812012,75991:2.7672300338745117,91042:1.964722752571106,85483:5.487469673156738,45764:4.326903343200684,83215:2.904540777206421,25284:4.734808444976807,90958:3.0112483501434326,29565:7.410068988800049,60779:6.3667192459106445,91156:1.6616008281707764,92814:2.255286693572998,23763:4.0394415855407715,7067:12.395035743713379,61058:6.993908405303955,55483:9.745443344116211,43286:3.622220039367676,65462:4.295836925506592,43535:1.5242335796356201,34898:6.624548435211182,66572:8.541470527648926,64323:2.1623659133911133,58008:3.128486394882202,33351:3.3363659381866455,36587:4.08017110824585,74747:2.935668706893921,38val > > >> > > > > >> > > mahout> val > > >> > > > > >> > > incoreRowLabelBindings:incoreRowLabelBindings:java.util.HashMap[String,Integer] > > >> > > = new java.util.HashMap(incoreTFIDF.getRowLabelBindings) > > >> > > incoreRowLabelBindings: java.util.HashMap[String,Integer] = > > >> > > {/talk.religion.misc/84570=7597} > > >> > > > > >> > > mahout> incoreTFIDF.nrow > > >> > > res1: Int = 7598 > > >> > > > > >> > > mahout> incoreRowLabelBindings.size > > >> > > res3: Int = 1 > > >> > > > > >> > > mahout> drmTFIDF.nrow > > >> > > res5: Long = 7598 > > >> > > > > >> > > mahout> drmRowLabelBindings.size > > >> > > res4: Int = 1 > > >> > > > > >> > > >> > > > > > > > >
