I should say a Vector of keys in corresponding to the Drm rows.
From: [email protected]
To: [email protected]
Subject: RE: drmFromHDFS rowLabelBindings question
Date: Fri, 12 Sep 2014 14:46:23 -0400




Ok thanks-  All that I need is a Vector of the String keys of the Drm (they 
contain the category labels that I need)- I think i was just going the long 
(wrong) way to get them.   Is there an easy way to extract these?


> Date: Fri, 12 Sep 2014 11:30:37 -0700
> Subject: Re: drmFromHDFS rowLabelBindings question
> From: [email protected]
> To: [email protected]
> 
> Actually, as it stands, collect doesn't support labels (either as keys or
> Named Vectors).
> 
> There are 2 considerations:
> (1) I chose to ignore any use of NamedVectors in DRM since DRM already has
> row keys, and two different sources have been creating ambiguity of
> interpretation, so i tailored all the algorithms on Spark to use keys (not
> named vectors).
> 
> (2) the reason why row keys are not translated to row bindings is because
> row keys are not necessarily strings, so translating them would require
> correct support of toString() transformation on any possible key.
> 
> (3) I all drm collect cases i ever encountered, i never had any use for
> keys.
> 
> Aside from (2) and (3) there's not much reason not to do it as described in
> (2).
> 
> -d
> 
> 
> On Fri, Sep 12, 2014 at 10:36 AM, Andrew Palumbo <[email protected]> wrote:
> 
> > I'm having some trouble getting the rowLabelBindings from a Sting-keyed
> > (Chekpointed...Spark)Drm from read in from HDFS.  I'm reading in a sequence
> > file of form <Text,VectorWritable> which is output from  seq2sparse.  The
> > Drm has 7598 rows and the vectors seem to be read in properly.  When I try
> > to get a Map using getRowLabelBindings(), I get back a Map of size 1.
> >
> > The single key/value pair in that map is consistent with what I would
> > expect:
> >
> >       k = /talk.religion.misc/84570, v = 7597
> >
> > (the last row..)  I don't know why I'm not getting entries for the rest of
> > the rows.  I've been looking through drmWrap() and drmFromHdfs() and don't
> > see where/if rowLabelBindings is being set except in collect() (I'm very
> > likely missing something because of my bad scala understanding).
> >
> > Is there a different way to get a map of key/rowIndex?
> >
> > Ultimately I'm working from the math-scala package, so I can't do anything
> > specific with RDDs.
> >
> > Below is where I'm hitting trouble shown in the Spark-Shell.
> >
> > Any Input is appreciated.  Thanks,
> >
> > Andy
> >
> >
> > mahout> val drmTFIDF= drmFromHDFS( path =
> > "/tmp/mahout-work-andy/20news-test-vectors/part-r-00000")
> > drmTFIDF: org.apache.mahout.math.drm.CheckpointedDrm[_] =
> > org.apache.mahout.sparkbindings.drm.CheckpointedDrmSpark@6e200e2d
> >
> > mahout> drmTFIDF.nrow
> > res0: Long = 7598
> >
> >             ^
> > mahout> val drmRowLabelBindings:java.util.HashMap[String,Integer] = new
> > java.util.HashMap(drmTFIDF.getRowLabelBindings)
> > drmRowLabelBindings: java.util.HashMap[String,Integer] =
> > {/talk.religion.misc/84570=7597}
> >
> > mahout> val incoreTFIDF=drmTFIDF.collect
> > incoreTFIDF: org.apache.mahout.math.Matrix =
> > {
> >   2770  =>
> > {40894:3.706777572631836,25040:5.326602935791016,63527:7.625180244445801,30072:9.138689994812012,75991:2.7672300338745117,91042:1.964722752571106,85483:5.487469673156738,45764:4.326903343200684,83215:2.904540777206421,25284:4.734808444976807,90958:3.0112483501434326,29565:7.410068988800049,60779:6.3667192459106445,91156:1.6616008281707764,92814:2.255286693572998,23763:4.0394415855407715,7067:12.395035743713379,61058:6.993908405303955,55483:9.745443344116211,43286:3.622220039367676,65462:4.295836925506592,43535:1.5242335796356201,34898:6.624548435211182,66572:8.541470527648926,64323:2.1623659133911133,58008:3.128486394882202,33351:3.3363659381866455,36587:4.08017110824585,74747:2.935668706893921,38val
> >
> > mahout> val
> > incoreRowLabelBindings:incoreRowLabelBindings:java.util.HashMap[String,Integer]
> > = new java.util.HashMap(incoreTFIDF.getRowLabelBindings)
> > incoreRowLabelBindings: java.util.HashMap[String,Integer] =
> > {/talk.religion.misc/84570=7597}
> >
> > mahout> incoreTFIDF.nrow
> > res1: Int = 7598
> >
> > mahout> incoreRowLabelBindings.size
> > res3: Int = 1
> >
> > mahout> drmTFIDF.nrow
> > res5: Long = 7598
> >
> > mahout> drmRowLabelBindings.size
> > res4: Int = 1
> >
                                                                                
  

Reply via email to