On 27 October 2011 12:54, Frank Scholten <[email protected]> wrote:
> On Thu, Oct 27, 2011 at 9:27 AM, Ted Dunning <[email protected]> wrote:
>
>>> There's some talk about beanifying our workflow steps in
>>> https://issues.apache.org/jira/browse/MAHOUT-612, but I can't say I
>>> understand how this would allow us to reach the composable workflow
>>> goal.
>>
>> I don't think it does.  It just passes data around in files like we do now.
>>
>
> Yes, MAHOUT-612 has beans for configuring kmeans, canopy and
> seq2sparse and the configuration is serialized and deserialized at the
> mappers and reducers. But it does not have a workflow engine or
> something like that. You have to connect the inputs of one job to the
> outputs of the other.

Any sense for whether having all the inputs/outputs written to hdfs is
a big problem, versus trying to plug things together in code so that
it doesn't all get serialized? I mean, how much is it worth trying to
do the latter instead of using the filesystem for integration.

Dan

Reply via email to