(Also a separate topic here) On Wed, Oct 26, 2011 at 5:19 PM, Dan Brickley <[email protected]> wrote: > > Also I've been thinking in very fuzzy terms about how to compose > larger tasks from smaller pieces, and wondering what might be a more > principled way of doing this than running each bin/mahout job by hand. > Obviously coding it up is one way, but also little shell scripts or > makefiles or (if forced at gunpoint) maybe Ant ...?
Well, there certainly seem to be a number of options out there, don't forget to mention the FlumeJava items like Ted's work on Plume or Cloudera Crunch. Is Oozie is an option for this as well? When I was looking at the clustering code recently and saw the various, methods starting with the run* prefix, I really wondered if there was a standard way that we could package these chunks of code (steps), that would allow them to be easily decomposed and re-combined in different ways. There's some talk about beanifying our workflow steps in https://issues.apache.org/jira/browse/MAHOUT-612, but I can't say I understand how this would allow us to reach the composable workflow goal.
