On Fri, Sep 12, 2014 at 11:57 AM, Dmitriy Lyubimov <[email protected]> wrote:
> bit i you are really compelled that it is something that might be needed, > the best way probably would be indeed create an optional parameter to > collect (something like drmLike.collect(extractLabels:Boolean=false)) which > you can flip to true if needed and the thing does toString on keys and > assinging them to in-core matrix' row labels. (requires a patch of course) > > As I mentioned in the other mail, this is already the case. The code seems to assume .toMap internally does collect. My (somewhat wild) suspicion is that this line is somehow fooling the eye: val rowBindings = d.map(t => (t._1._1.toString, t._2: java.lang.Integer)).toMap Thanks On Fri, Sep 12, 2014 at 11:55 AM, Dmitriy Lyubimov <[email protected]> > wrote: > > > if you bail out into pure Spark out of algebraic DSL, yes. > > > > something like drmA.rdd.map(_._1).collect > > > > On Fri, Sep 12, 2014 at 11:46 AM, Andrew Palumbo <[email protected]> > > wrote: > > > >> Ok thanks- All that I need is a Vector of the String keys of the Drm > >> (they contain the category labels that I need)- I think i was just going > >> the long (wrong) way to get them. Is there an easy way to extract > these? > >> > >> > >> > Date: Fri, 12 Sep 2014 11:30:37 -0700 > >> > Subject: Re: drmFromHDFS rowLabelBindings question > >> > From: [email protected] > >> > To: [email protected] > >> > > >> > Actually, as it stands, collect doesn't support labels (either as keys > >> or > >> > Named Vectors). > >> > > >> > There are 2 considerations: > >> > (1) I chose to ignore any use of NamedVectors in DRM since DRM already > >> has > >> > row keys, and two different sources have been creating ambiguity of > >> > interpretation, so i tailored all the algorithms on Spark to use keys > >> (not > >> > named vectors). > >> > > >> > (2) the reason why row keys are not translated to row bindings is > >> because > >> > row keys are not necessarily strings, so translating them would > require > >> > correct support of toString() transformation on any possible key. > >> > > >> > (3) I all drm collect cases i ever encountered, i never had any use > for > >> > keys. > >> > > >> > Aside from (2) and (3) there's not much reason not to do it as > >> described in > >> > (2). > >> > > >> > -d > >> > > >> > > >> > On Fri, Sep 12, 2014 at 10:36 AM, Andrew Palumbo <[email protected]> > >> wrote: > >> > > >> > > I'm having some trouble getting the rowLabelBindings from a > >> Sting-keyed > >> > > (Chekpointed...Spark)Drm from read in from HDFS. I'm reading in a > >> sequence > >> > > file of form <Text,VectorWritable> which is output from seq2sparse. > >> The > >> > > Drm has 7598 rows and the vectors seem to be read in properly. When > >> I try > >> > > to get a Map using getRowLabelBindings(), I get back a Map of size > 1. > >> > > > >> > > The single key/value pair in that map is consistent with what I > would > >> > > expect: > >> > > > >> > > k = /talk.religion.misc/84570, v = 7597 > >> > > > >> > > (the last row..) I don't know why I'm not getting entries for the > >> rest of > >> > > the rows. I've been looking through drmWrap() and drmFromHdfs() and > >> don't > >> > > see where/if rowLabelBindings is being set except in collect() (I'm > >> very > >> > > likely missing something because of my bad scala understanding). > >> > > > >> > > Is there a different way to get a map of key/rowIndex? > >> > > > >> > > Ultimately I'm working from the math-scala package, so I can't do > >> anything > >> > > specific with RDDs. > >> > > > >> > > Below is where I'm hitting trouble shown in the Spark-Shell. > >> > > > >> > > Any Input is appreciated. Thanks, > >> > > > >> > > Andy > >> > > > >> > > > >> > > mahout> val drmTFIDF= drmFromHDFS( path = > >> > > "/tmp/mahout-work-andy/20news-test-vectors/part-r-00000") > >> > > drmTFIDF: org.apache.mahout.math.drm.CheckpointedDrm[_] = > >> > > org.apache.mahout.sparkbindings.drm.CheckpointedDrmSpark@6e200e2d > >> > > > >> > > mahout> drmTFIDF.nrow > >> > > res0: Long = 7598 > >> > > > >> > > ^ > >> > > mahout> val drmRowLabelBindings:java.util.HashMap[String,Integer] = > >> new > >> > > java.util.HashMap(drmTFIDF.getRowLabelBindings) > >> > > drmRowLabelBindings: java.util.HashMap[String,Integer] = > >> > > {/talk.religion.misc/84570=7597} > >> > > > >> > > mahout> val incoreTFIDF=drmTFIDF.collect > >> > > incoreTFIDF: org.apache.mahout.math.Matrix = > >> > > { > >> > > 2770 => > >> > > > >> > {40894:3.706777572631836,25040:5.326602935791016,63527:7.625180244445801,30072:9.138689994812012,75991:2.7672300338745117,91042:1.964722752571106,85483:5.487469673156738,45764:4.326903343200684,83215:2.904540777206421,25284:4.734808444976807,90958:3.0112483501434326,29565:7.410068988800049,60779:6.3667192459106445,91156:1.6616008281707764,92814:2.255286693572998,23763:4.0394415855407715,7067:12.395035743713379,61058:6.993908405303955,55483:9.745443344116211,43286:3.622220039367676,65462:4.295836925506592,43535:1.5242335796356201,34898:6.624548435211182,66572:8.541470527648926,64323:2.1623659133911133,58008:3.128486394882202,33351:3.3363659381866455,36587:4.08017110824585,74747:2.935668706893921,38val > >> > > > >> > > mahout> val > >> > > > >> > incoreRowLabelBindings:incoreRowLabelBindings:java.util.HashMap[String,Integer] > >> > > = new java.util.HashMap(incoreTFIDF.getRowLabelBindings) > >> > > incoreRowLabelBindings: java.util.HashMap[String,Integer] = > >> > > {/talk.religion.misc/84570=7597} > >> > > > >> > > mahout> incoreTFIDF.nrow > >> > > res1: Int = 7598 > >> > > > >> > > mahout> incoreRowLabelBindings.size > >> > > res3: Int = 1 > >> > > > >> > > mahout> drmTFIDF.nrow > >> > > res5: Long = 7598 > >> > > > >> > > mahout> drmRowLabelBindings.size > >> > > res4: Int = 1 > >> > > > >> > >> > > > > >
