Edward,
Thank you for your suggestion. It's certainly an option for our
project graduation.
I'm discussing about it with other PPMC members of Hivemall.
My concerns are
1) Hivemall is not only for Hive but also targets Spark and Pig as the runtime.
- It has some Spark Dataframe related features
http://hivemall.incubator.apache.org/userguide/spark/misc/topk_join.html
2) Project management (e.g., release process) for a subproject.
- artifacts better to be separated to Hive (Separation of Concerns)
- It seems that Apache DB subproject are distinct ones.
https://db.apache.org/newproject.html
We are now on a very early stage in Apache incubation, planning the
first release in early Q2.
It might be too early to discuss but we welcome your suggestion.
Thanks,
Makoto
2017-03-03 15:15 GMT+09:00 Edward Capriolo <[email protected]>:
> Hivemall in the incubator has a fairly impressive set of features that do
> machine learning directly from hive.
>
> http://hivemall.incubator.apache.org/overview.html
> https://github.com/myui/hivemall/wiki/Logistic-regression-dataset-generation
>
> While we can not put the cart before the horse, i can imagine that upon
> graduation hivemall would be a natural fit to become part of hive (maybe as
> a sub project).
>
> I could imagine we can setup like we did for hcat where we make a subtree
> and give commit rights to the tree eventually converting those interested
> in other parts of hive to hive committers as well.
>
> In any case hivemall devs, amazing work!
>
> Thanks,
> Edward