Not sure if this helps but we (Sebastian and I) created an IndexedDataset which 
maintains row and column HashBiMaps that use the Int key to map to/from 
Strings. There are Reader and Writer traits for file IO (text files for now). 
The flow is to read an IndexedDataset using the Reader trait. Inside the 
IndexedDataset you have a CheckpointedDrm and two label BiMaps for rows and 
columns. This method is used in the row and item similarity jobs where you do 
math things like B.t %*% A After you do the math using the drm contained in the 
IndexedDataset you assign the correct dictionaries to the resulting 
IndexedDataset to maintain your labels for writing or further math. It might 
make sense to implement some of the math ops that would work with this simple 
approach but in any case you can do it explicitly as those jobs do. The idea 
was to support other file formats like sequence files as the need comes up.

On Sep 12, 2014, at 1:14 PM, Andrew Palumbo <[email protected]> wrote:

It doesn't look like it has anything to do with the conversion.  

after:

   val rowBindings = d.map(t => (t._1._1.toString, t._2: 
java.lang.Integer)).toMap

rowBindings.size  is one

From: [email protected]
To: [email protected]
Subject: RE: drmFromHDFS rowLabelBindings question
Date: Fri, 12 Sep 2014 15:53:48 -0400




Thanks guys,  I was wondering about the java.util.Map conversion too.  I'll try 
copying everything into a java.util.HashMap and passing that to setRowBindings. 
 I'll play around with it and if i cant get it to work, I'll file a jira.  

I'm just using it in the NB implementation so its not a pressing issue.

Appreciate it.

> Date: Fri, 12 Sep 2014 12:35:21 -0700
> Subject: Re: drmFromHDFS rowLabelBindings question
> From: [email protected]
> To: [email protected]
> 
> On Fri, Sep 12, 2014 at 12:17 PM, Anand Avati <[email protected]> wrote:
> 
>> 
>> 
>> On Fri, Sep 12, 2014 at 12:00 PM, Anand Avati <[email protected]> wrote:
>> 
>>> 
>>> 
>>> On Fri, Sep 12, 2014 at 11:57 AM, Dmitriy Lyubimov <[email protected]>
>>> wrote:
>>> 
>>>> bit i you are really compelled that it is something that might be needed,
>>>> the best way probably would be indeed create an optional parameter to
>>>> collect (something like drmLike.collect(extractLabels:Boolean=false))
>>>> which
>>>> you can flip to true if needed and the thing does toString on keys and
>>>> assinging them to in-core matrix' row labels. (requires a patch of
>>>> course)
>>>> 
>>>> 
>>> As I mentioned in the other mail, this is already the case. The code
>>> seems to assume .toMap internally does collect. My (somewhat wild)
>>> suspicion is that this line is somehow fooling the eye:
>>> 
>>> val rowBindings = d.map(t => (t._1._1.toString, t._2: 
>>> java.lang.Integer)).toMap
>>> 
>>> 
>>> 
>> Argh, for a moment I was thinking `d` is still an rdd. It is actually all
>> in-core, as the entirety of the rdd is collected up front into `data`. In
>> any case I suspect the non-int key collecting code might be doing something
>> funny.
>> 
> 
> One problem I see is that toMap() returns scala.collections.Map, whereas
> the next line, m.setRowLabelBindings accepts a java.util.Map. Since the
> code compiles fine there is probably an implicit conversion happening
> somewhere, and I dont know if the conversion is doing the right thing.
> Other than this, rest of the code seems to look fine.
                                                                                
  

Reply via email to