Hivemall in the incubator has a fairly impressive set of features that do machine learning directly from hive.
http://hivemall.incubator.apache.org/overview.html https://github.com/myui/hivemall/wiki/Logistic-regression-dataset-generation While we can not put the cart before the horse, i can imagine that upon graduation hivemall would be a natural fit to become part of hive (maybe as a sub project). I could imagine we can setup like we did for hcat where we make a subtree and give commit rights to the tree eventually converting those interested in other parts of hive to hive committers as well. In any case hivemall devs, amazing work! Thanks, Edward