Sorry I just crossed those two modules mentally. math is Hadoop-free, not core. So yeah you'd probably end up with these bits in core.
On Thu, Jul 29, 2010 at 10:46 PM, Shannon Quinn <[email protected]> wrote: > I more or less used the basic operations of the > DistributedRowMatrix.timesSquared() operation in terms of the > writing/reading Vectors to the cache; these method calls could basically > replace the code that's currently there. > > I'm more than willing to keep this code local to my own packages, or to the > mahout.math.hadoop package or mahout-math project. I'll keep it local until > decided otherwise. > > Though here's another random question I just came across: in the > timesSquared() Reducer, the output is repeated pairs of (NullWritable, > VectorWritable) - does this create a single (merged) VectorWritable under > the same key, or a list of VectorWritables? > > Thanks again! > > Shannon > > On Thu, Jul 29, 2010 at 3:40 PM, Sean Owen <[email protected]> wrote: > >> core has to be Hadoop-free as it does not have Hadoop as a dependency, >> and that is important. >> >> It sounds like it belongs in utils. But then I wonder why you have >> code in core that also depends on Hadoop (indirectly)? >> >> math seems to be the home of Hadoop-based math stuff. I think that's >> the home of all your code. >> >> I might suggest not putting things into utils until it's clear >> something else can use them, and the code has been written to be >> generalizable. I fear utils and other "common" areas turn into a grab >> bag of code that something uses, and that something may use someday, >> but isn't reused yet. That creates problems. >> >> Sean >> >> On Thu, Jul 29, 2010 at 10:12 PM, Shannon Quinn <[email protected]> wrote: >> > Hi all, >> > >> > A technical question regarding some utility methods I'm trying to >> implement: >> > a lot of my M/R tasks require ancillary vectors that I've been saving to >> the >> > cache and which are retrieved in the Mapper and/or Reducer for performing >> > the computations. Since this is done so often, I wrote utility load() and >> > save() methods for accomplishing this. Isabel suggested I make them more >> > generally available, e.g. in some common or utils package in Mahout. >> > >> > 1) Where would be the best place to put this load() and save() -to-cache >> > functionality? I've tried it in the mahout-core o.a.m.common package, and >> > mahout-utils o.a.m.vectors package. I know someone had mentioned that the >> > core should be kept as Hadoop-free as possible, so since this explicitly >> > calls Hadoop functions (DistributedCache, FileSystem, HadoopUtil, etc) I >> > figured the core may not be the best place... >> > >> > 2) ...but as the eclipse projects are currently set up, mahout-utils >> depends >> > on mahout-core and mahout-examples, and mahout-core depends only on >> > mahout-math. As such, with my utility functions in mahout-utils, and the >> > code in need of them in mahout-core, I can't specify a correct import >> unless >> > I modify the properties of mahout-core to "see" mahout-utils. While this >> > does indeed fix the import problem, it introduces a cyclical dependency >> > which is obviously not ideal, either. >> > >> > Thank you in advance for your help! >> > >> > Regards, >> > Shannon >> > >> >
