On Wed, Jun 18, 2014 at 5:35 PM, Ted Dunning <ted.dunn...@gmail.com> wrote:
> Also, note that the row keys in Mahout are not actually stored in the > matrices that we manipulate. They are. I am not sure about DistributedRowMatrix class for mapreduce, but in sparkbindings they are. they are intimately relevant to all algebra and especially transposition rewrites. Even in-core matrices support column/row labels, although nobody seems to be using it. > If the keys can be handled separately, > outside of the flow for the data in a drm, then you should be pretty much > good to go. > > > > > On Wed, Jun 18, 2014 at 5:34 PM, Ted Dunning <ted.dunn...@gmail.com> > wrote: > > > > > On Wed, Jun 18, 2014 at 12:03 PM, Dmitriy Lyubimov <dlie...@gmail.com> > > wrote: > > > >> > How important are the String row keys for the algorithms itself? Would > >> it > >> > grossly mess up a workflow if Strings are silently discarded by the > >> > backend? > >> > > >> > >> like i said, seq2sparse produces them, and postprocessing for stuff like > >> LSA pipelines would not work. > > > > > > Something as coarse as translating to a dictionary index would probably > > work. Creating the dictionary in parallel while reading the data should > be > > quite doable. > > > > >