On 27 October 2011 12:54, Frank Scholten <[email protected]> wrote: > On Thu, Oct 27, 2011 at 9:27 AM, Ted Dunning <[email protected]> wrote: > >>> There's some talk about beanifying our workflow steps in >>> https://issues.apache.org/jira/browse/MAHOUT-612, but I can't say I >>> understand how this would allow us to reach the composable workflow >>> goal. >> >> I don't think it does. It just passes data around in files like we do now. >> > > Yes, MAHOUT-612 has beans for configuring kmeans, canopy and > seq2sparse and the configuration is serialized and deserialized at the > mappers and reducers. But it does not have a workflow engine or > something like that. You have to connect the inputs of one job to the > outputs of the other.
Any sense for whether having all the inputs/outputs written to hdfs is a big problem, versus trying to plug things together in code so that it doesn't all get serialized? I mean, how much is it worth trying to do the latter instead of using the filesystem for integration. Dan
