Hi all, A technical question regarding some utility methods I'm trying to implement: a lot of my M/R tasks require ancillary vectors that I've been saving to the cache and which are retrieved in the Mapper and/or Reducer for performing the computations. Since this is done so often, I wrote utility load() and save() methods for accomplishing this. Isabel suggested I make them more generally available, e.g. in some common or utils package in Mahout.
1) Where would be the best place to put this load() and save() -to-cache functionality? I've tried it in the mahout-core o.a.m.common package, and mahout-utils o.a.m.vectors package. I know someone had mentioned that the core should be kept as Hadoop-free as possible, so since this explicitly calls Hadoop functions (DistributedCache, FileSystem, HadoopUtil, etc) I figured the core may not be the best place... 2) ...but as the eclipse projects are currently set up, mahout-utils depends on mahout-core and mahout-examples, and mahout-core depends only on mahout-math. As such, with my utility functions in mahout-utils, and the code in need of them in mahout-core, I can't specify a correct import unless I modify the properties of mahout-core to "see" mahout-utils. While this does indeed fix the import problem, it introduces a cyclical dependency which is obviously not ideal, either. Thank you in advance for your help! Regards, Shannon
