Re: Crunch, Mahout, and HCatalog

Matthias Friedrich Sun, 24 Mar 2013 10:00:07 -0700

On Friday, 2013-03-22, Josh Wills wrote:
> I'm working on some tools for doing data integration and building machine
> learning models w/Crunch, Mahout, and (soon!) HCatalog, and I wrote about
> what I'm up to here:
> 
> http://blog.cloudera.com/blog/2013/03/cloudera_ml_data_science_tools/
> 
> and the code is here: https://github.com/cloudera/ml


Cool thing, thanks for open sourcing it!

[...]
> Q: Why not do this as part of the Crunch or Mahout projects?
> A: Dependency management. Crunch doesn't depend on Mahout, and Mahout
> doesn't depend on Crunch, and I think that for the sanity of the developers
> of both projects, it should stay that way. Dependency management is already
> enough of a nightmare for Hadoop projects that I didn't want to do anything
> to make it worse. I will contribute anything from the toolkit back to
> Crunch that is deemed useful by the community (e.g., the reservoir sampling
> stuff in CRUNCH-178) and doesn't introduce any new dependencies.

This is really sad - but most probably the best decision for now. Do
you happen to know if there is any work planned on the Hadoop side to
clean up this situation?

Regards,
  Matthias

Re: Crunch, Mahout, and HCatalog

Reply via email to