[ 
https://issues.apache.org/jira/browse/HIVE-672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-672:
----------------------------

    Attachment: weka.jar
                HIVE-672.2.not.to.be.included.patch

HIVE-672.2.not.to.be.included.patch is the patch.
weka.jar should be put into contrib/lib.

There are test cases in the patch to show how to use the new functions.


> Integrate weka with Hive
> ------------------------
>
>                 Key: HIVE-672
>                 URL: https://issues.apache.org/jira/browse/HIVE-672
>             Project: Hadoop Hive
>          Issue Type: New Feature
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>         Attachments: HIVE-672.1.not.to.be.included.patch, 
> HIVE-672.2.not.to.be.included.patch, weka.jar
>
>
> Weka is one of the most popular data mining package on the planet. It's used 
> by numerous people around the world. Since weka is in Java, it should be 
> pretty straight-forward to integrate weka with Hive.
> We just need to create some GenericUDAF functions that maps to Weka 
> classifier training process. The output of the GenericUDAF can just be the 
> serialized version of the trained classifiers.
> We should add another GenericUDF to load the classifier to classify new 
> instances.
> The hive syntax can be as simple as this: (Note: In the example above, most 
> of the "table." can be omitted. I put it there just for easier understanding 
> of the query semantics.)
> The query builds a model (logistic regression) for predicting the CTR of each 
> link on each page, based on user information, and evaluates the model on some 
> data.
> {code}
> SELECT logdata.pageid, logdata.linkid, LogisticRegression( logdata.clicked, 
> userinfo.age, userinfo.gender, userinfo.country, userinfo.interests ) as model
> FROM logdata JOIN userinfo
> ON logdata.userid = userinfo.userid
> GROUP BY logdata.pageid, logdata.linkid;
> SELECT logdata.pageid, logdata.linkid, logdata.clicked, 
> LogisticRegressionEvaluate(classifiers.model, userinfo.age, userinfo.gender, 
> userinfo.country, userinfo.interests) AS predicted
> FROM logdata JOIN userinfo
> ON logdata.userid = userinfo.userid
> JOIN classifiers
> ON logdata.pageid = classifiers.pageid AND logdata.linkid = classifiers.linkid
> {code}
> References:
> Use Weka in your Java Code: 
> http://weka.wiki.sourceforge.net/Use+Weka+in+your+Java+code
> Note:
> Weka is under GPL license. We won't be able to include the code directly into 
> Hive, but we can keep the discussions here.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to