(Also a separate topic here)

On Wed, Oct 26, 2011 at 5:19 PM, Dan Brickley <[email protected]> wrote:
>
> Also I've been thinking in very fuzzy terms about how to compose
> larger tasks from smaller pieces, and wondering what might be a more
> principled way of doing this than running each bin/mahout job by hand.
> Obviously coding it up is one way, but also little shell scripts or
> makefiles or (if forced at gunpoint) maybe Ant ...?

Well, there certainly seem to be a number of options out there, don't
forget to mention the FlumeJava items like Ted's work on Plume or
Cloudera Crunch. Is Oozie is an option for this as well? When I was
looking at the clustering code recently and saw the various, methods
starting with the run* prefix, I really wondered if there was a
standard way that we could package these chunks of code (steps), that
would allow them to be easily decomposed and re-combined in different
ways.

There's some talk about beanifying our workflow steps in
https://issues.apache.org/jira/browse/MAHOUT-612, but I can't say I
understand how this would allow us to reach the composable workflow
goal.

Reply via email to