On 9/30/10 9:11 AM, Oleksandr Petrov (JIRA) wrote:
Although, the ideology of Mahout is not clear and somewhat inconsistance: LDA is
implemented that way. K-means does include names of source vector/file. DirichletCluster
is implemented in other way, it's generic and is not derived (at least in 0.3) from
MapReduceBase. That kind of inconsistency is a potential source of big problems. Every
driver should share the same exact top-level ideology, even if "under the hood"
there's a lot of different things.
We agreed and this has all been changed in trunk. Now all clustering
drivers share common arguments where possible and inherit from
AbstractJob for uniformity in CLI invocation.