Sorry I just crossed those two modules mentally. math is Hadoop-free,
not core. So yeah you'd probably end up with these bits in core.

On Thu, Jul 29, 2010 at 10:46 PM, Shannon Quinn <[email protected]> wrote:
> I more or less used the basic operations of the
> DistributedRowMatrix.timesSquared() operation in terms of the
> writing/reading Vectors to the cache; these method calls could basically
> replace the code that's currently there.
>
> I'm more than willing to keep this code local to my own packages, or to the
> mahout.math.hadoop package or mahout-math project. I'll keep it local until
> decided otherwise.
>
> Though here's another random question I just came across: in the
> timesSquared() Reducer, the output is repeated pairs of (NullWritable,
> VectorWritable) - does this create a single (merged) VectorWritable under
> the same key, or a list of VectorWritables?
>
> Thanks again!
>
> Shannon
>
> On Thu, Jul 29, 2010 at 3:40 PM, Sean Owen <[email protected]> wrote:
>
>> core has to be Hadoop-free as it does not have Hadoop as a dependency,
>> and that is important.
>>
>> It sounds like it belongs in utils. But then I wonder why you have
>> code in core that also depends on Hadoop (indirectly)?
>>
>> math seems to be the home of Hadoop-based math stuff. I think that's
>> the home of all your code.
>>
>> I might suggest not putting things into utils until it's clear
>> something else can use them, and the code has been written to be
>> generalizable. I fear utils and other "common" areas turn into a grab
>> bag of code that something uses, and that something may use someday,
>> but isn't reused yet. That creates problems.
>>
>> Sean
>>
>> On Thu, Jul 29, 2010 at 10:12 PM, Shannon Quinn <[email protected]> wrote:
>> > Hi all,
>> >
>> > A technical question regarding some utility methods I'm trying to
>> implement:
>> > a lot of my M/R tasks require ancillary vectors that I've been saving to
>> the
>> > cache and which are retrieved in the Mapper and/or Reducer for performing
>> > the computations. Since this is done so often, I wrote utility load() and
>> > save() methods for accomplishing this. Isabel suggested I make them more
>> > generally available, e.g. in some common or utils package in Mahout.
>> >
>> > 1) Where would be the best place to put this load() and save() -to-cache
>> > functionality? I've tried it in the mahout-core o.a.m.common package, and
>> > mahout-utils o.a.m.vectors package. I know someone had mentioned that the
>> > core should be kept as Hadoop-free as possible, so since this explicitly
>> > calls Hadoop functions (DistributedCache, FileSystem, HadoopUtil, etc) I
>> > figured the core may not be the best place...
>> >
>> > 2) ...but as the eclipse projects are currently set up, mahout-utils
>> depends
>> > on mahout-core and mahout-examples, and mahout-core depends only on
>> > mahout-math. As such, with my utility functions in mahout-utils, and the
>> > code in need of them in mahout-core, I can't specify a correct import
>> unless
>> > I modify the properties of mahout-core to "see" mahout-utils. While this
>> > does indeed fix the import problem, it introduces a cyclical dependency
>> > which is obviously not ideal, either.
>> >
>> > Thank you in advance for your help!
>> >
>> > Regards,
>> > Shannon
>> >
>>
>

Reply via email to