Hi all,

A technical question regarding some utility methods I'm trying to implement:
a lot of my M/R tasks require ancillary vectors that I've been saving to the
cache and which are retrieved in the Mapper and/or Reducer for performing
the computations. Since this is done so often, I wrote utility load() and
save() methods for accomplishing this. Isabel suggested I make them more
generally available, e.g. in some common or utils package in Mahout.

1) Where would be the best place to put this load() and save() -to-cache
functionality? I've tried it in the mahout-core o.a.m.common package, and
mahout-utils o.a.m.vectors package. I know someone had mentioned that the
core should be kept as Hadoop-free as possible, so since this explicitly
calls Hadoop functions (DistributedCache, FileSystem, HadoopUtil, etc) I
figured the core may not be the best place...

2) ...but as the eclipse projects are currently set up, mahout-utils depends
on mahout-core and mahout-examples, and mahout-core depends only on
mahout-math. As such, with my utility functions in mahout-utils, and the
code in need of them in mahout-core, I can't specify a correct import unless
I modify the properties of mahout-core to "see" mahout-utils. While this
does indeed fix the import problem, it introduces a cyclical dependency
which is obviously not ideal, either.

Thank you in advance for your help!

Regards,
Shannon

Reply via email to