Repository: incubator-hivemall Updated Branches: refs/heads/master 31932fd7c -> a823e2b17
[HIVEMALL-214][DOC] Update userguide for General Classifier/Regressor example ## What changes were proposed in this pull request? Refine user guide for generic classifier/regressor and so on. ## What type of PR is it? Documentation ## What is the Jira issue? https://issues.apache.org/jira/browse/HIVEMALL-214 ## How to use this feature? See user guide. Author: Makoto Yui <[email protected]> Closes #159 from myui/HIVEMALL-214. Project: http://git-wip-us.apache.org/repos/asf/incubator-hivemall/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-hivemall/commit/a823e2b1 Tree: http://git-wip-us.apache.org/repos/asf/incubator-hivemall/tree/a823e2b1 Diff: http://git-wip-us.apache.org/repos/asf/incubator-hivemall/diff/a823e2b1 Branch: refs/heads/master Commit: a823e2b17536dbf4418286fa508c93e912a75e8d Parents: 31932fd Author: Makoto Yui <[email protected]> Authored: Wed Dec 26 19:15:43 2018 +0900 Committer: Makoto Yui <[email protected]> Committed: Wed Dec 26 19:15:43 2018 +0900 ---------------------------------------------------------------------- docs/gitbook/SUMMARY.md | 57 +++--- docs/gitbook/binaryclass/a9a_generic.md | 109 ++++++++++++ docs/gitbook/binaryclass/a9a_lr.md | 16 +- docs/gitbook/binaryclass/a9a_minibatch.md | 9 +- docs/gitbook/binaryclass/general.md | 51 +++--- docs/gitbook/binaryclass/news20_adagrad.md | 45 +++-- docs/gitbook/binaryclass/news20_dataset.md | 20 +-- docs/gitbook/binaryclass/news20_generic.md | 83 +++++++++ docs/gitbook/binaryclass/news20_pa.md | 87 +++------- docs/gitbook/binaryclass/news20_rf.md | 6 +- docs/gitbook/binaryclass/news20_scw.md | 121 ++++--------- docs/gitbook/regression/e2006_arow.md | 183 +++++++++----------- docs/gitbook/regression/e2006_generic.md | 90 ++++++++++ docs/gitbook/supervised_learning/prediction.md | 78 ++++++++- docs/gitbook/supervised_learning/tutorial.md | 48 ++--- 15 files changed, 636 insertions(+), 367 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/a823e2b1/docs/gitbook/SUMMARY.md ---------------------------------------------------------------------- diff --git a/docs/gitbook/SUMMARY.md b/docs/gitbook/SUMMARY.md index 31a0311..7ead819 100644 --- a/docs/gitbook/SUMMARY.md +++ b/docs/gitbook/SUMMARY.md @@ -89,44 +89,46 @@ * [Binary Classification](binaryclass/general.md) * [a9a Tutorial](binaryclass/a9a.md) - * [Data preparation](binaryclass/a9a_dataset.md) + * [Data Preparation](binaryclass/a9a_dataset.md) + * [General Binary Classifier](binaryclass/a9a_generic.md) * [Logistic Regression](binaryclass/a9a_lr.md) - * [Mini-batch gradient descent](binaryclass/a9a_minibatch.md) + * [Mini-batch Gradient Descent](binaryclass/a9a_minibatch.md) * [News20 Tutorial](binaryclass/news20.md) - * [Data preparation](binaryclass/news20_dataset.md) + * [Data Preparation](binaryclass/news20_dataset.md) * [Perceptron, Passive Aggressive](binaryclass/news20_pa.md) * [CW, AROW, SCW](binaryclass/news20_scw.md) + * [General Binary Classifier](binaryclass/news20_generic.md) * [AdaGradRDA, AdaGrad, AdaDelta](binaryclass/news20_adagrad.md) * [Random Forest](binaryclass/news20_rf.md) * [KDD2010a Tutorial](binaryclass/kdd2010a.md) - * [Data preparation](binaryclass/kdd2010a_dataset.md) + * [Data Preparation](binaryclass/kdd2010a_dataset.md) * [PA, CW, AROW, SCW](binaryclass/kdd2010a_scw.md) * [KDD2010b Tutorial](binaryclass/kdd2010b.md) - * [Data preparation](binaryclass/kdd2010b_dataset.md) + * [Data Preparation](binaryclass/kdd2010b_dataset.md) * [AROW](binaryclass/kdd2010b_arow.md) * [Webspam Tutorial](binaryclass/webspam.md) - * [Data pareparation](binaryclass/webspam_dataset.md) + * [Data Pareparation](binaryclass/webspam_dataset.md) * [PA1, AROW, SCW](binaryclass/webspam_scw.md) * [Kaggle Titanic Tutorial](binaryclass/titanic_rf.md) * [Criteo Tutorial](binaryclass/criteo.md) - * [Data preparation](binaryclass/criteo_dataset.md) + * [Data Preparation](binaryclass/criteo_dataset.md) * [Field-Aware Factorization Machines](binaryclass/criteo_ffm.md) ## Part VII - Multiclass Classification * [News20 Multiclass Tutorial](multiclass/news20.md) - * [Data preparation](multiclass/news20_dataset.md) - * [Data preparation for one-vs-the-rest classifiers](multiclass/news20_one-vs-the-rest_dataset.md) + * [Data Preparation](multiclass/news20_dataset.md) + * [Data Preparation for one-vs-the-rest classifiers](multiclass/news20_one-vs-the-rest_dataset.md) * [PA](multiclass/news20_pa.md) * [CW, AROW, SCW](multiclass/news20_scw.md) * [Ensemble learning](multiclass/news20_ensemble.md) - * [one-vs-the-rest classifier](multiclass/news20_one-vs-the-rest.md) + * [one-vs-the-rest Classifier](multiclass/news20_one-vs-the-rest.md) * [Iris Tutorial](multiclass/iris.md) * [Data preparation](multiclass/iris_dataset.md) @@ -138,11 +140,12 @@ * [Regression](regression/general.md) * [E2006-tfidf Regression Tutorial](regression/e2006.md) - * [Data preparation](regression/e2006_dataset.md) + * [Data Preparation](regression/e2006_dataset.md) + * [General Regessor](regression/e2006_generic.md) * [Passive Aggressive, AROW](regression/e2006_arow.md) * [KDDCup 2012 Track 2 CTR Prediction Tutorial](regression/kddcup12tr2.md) - * [Data preparation](regression/kddcup12tr2_dataset.md) + * [Data Preparation](regression/kddcup12tr2_dataset.md) * [Logistic Regression, Passive Aggressive](regression/kddcup12tr2_lr.md) * [Logistic Regression with amplifier](regression/kddcup12tr2_lr_amplify.md) * [AdaGrad, AdaDelta](regression/kddcup12tr2_adagrad.md) @@ -150,21 +153,21 @@ ## Part IX - Recommendation * [Collaborative Filtering](recommend/cf.md) - * [Item-based collaborative filtering](recommend/item_based_cf.md) + * [Item-based Collaborative Filtering](recommend/item_based_cf.md) * [News20 Related Article Recommendation Tutorial](recommend/news20.md) - * [Data preparation](multiclass/news20_dataset.md) - * [LSH/MinHash and Jaccard similarity](recommend/news20_jaccard.md) - * [LSH/MinHash and brute-force search](recommend/news20_knn.md) + * [Data Preparation](multiclass/news20_dataset.md) + * [LSH/MinHash and Jaccard Similarity](recommend/news20_jaccard.md) + * [LSH/MinHash and Brute-force Search](recommend/news20_knn.md) * [kNN search using b-Bits MinHash](recommend/news20_bbit_minhash.md) * [MovieLens Movie Recommendation Tutorial](recommend/movielens.md) - * [Data preparation](recommend/movielens_dataset.md) - * [Item-based collaborative filtering](recommend/movielens_cf.md) + * [Data Preparation](recommend/movielens_dataset.md) + * [Item-based Collaborative Filtering](recommend/movielens_cf.md) * [Matrix Factorization](recommend/movielens_mf.md) * [Factorization Machine](recommend/movielens_fm.md) - * [SLIM for fast top-k recommendation](recommend/movielens_slim.md) - * [10-fold cross validation (Matrix Factorization)](recommend/movielens_cv.md) + * [SLIM for fast top-k Recommendation](recommend/movielens_slim.md) + * [10-fold Cross Validation (Matrix Factorization)](recommend/movielens_cv.md) ## Part X - Anomaly Detection @@ -187,16 +190,16 @@ * [Installation](spark/getting_started/installation.md) * [Binary Classification](spark/binaryclass/index.md) - * [a9a tutorial for DataFrame](spark/binaryclass/a9a_df.md) - * [a9a tutorial for SQL](spark/binaryclass/a9a_sql.md) + * [a9a Tutorial for DataFrame](spark/binaryclass/a9a_df.md) + * [a9a Tutorial for SQL](spark/binaryclass/a9a_sql.md) * [Regression](spark/binaryclass/index.md) - * [E2006-tfidf regression tutorial for DataFrame](spark/regression/e2006_df.md) - * [E2006-tfidf regression tutorial for SQL](spark/regression/e2006_sql.md) + * [E2006-tfidf Regression Tutorial for DataFrame](spark/regression/e2006_df.md) + * [E2006-tfidf Regression Tutorial for SQL](spark/regression/e2006_sql.md) -* [Generic features](spark/misc/misc.md) - * [Top-k join processing](spark/misc/topk_join.md) - * [Other utility functions](spark/misc/functions.md) +* [Generic Features](spark/misc/misc.md) + * [Top-k Join Processing](spark/misc/topk_join.md) + * [Other Utility Functions](spark/misc/functions.md) ## Part XIV - Hivemall on Docker http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/a823e2b1/docs/gitbook/binaryclass/a9a_generic.md ---------------------------------------------------------------------- diff --git a/docs/gitbook/binaryclass/a9a_generic.md b/docs/gitbook/binaryclass/a9a_generic.md new file mode 100644 index 0000000..8a482ca --- /dev/null +++ b/docs/gitbook/binaryclass/a9a_generic.md @@ -0,0 +1,109 @@ +<!-- + Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. +--> + +This page shows the usage of General Binary Classifier using a9a dataset. + +<!-- toc --> + +# Training + +```sql +create table model +as +select + feature, + avg(weight) as weight +from ( + select + train_classifier( + add_bias(features), label, + "-loss logistic -iter 30" + ) as (feature,weight) + from + a9a_train + ) t +group by feature; +``` + +# Prediction + +```sql +create table predict +as +WITH exploded as ( +select + rowid, + label, + extract_feature(feature) as feature, + extract_weight(feature) as value +from + a9a_test LATERAL VIEW explode(add_bias(features)) t AS feature +) +select + t.rowid, + sigmoid(sum(m.weight * t.value)) as prob, + (case when sigmoid(sum(m.weight * t.value)) >= 0.5 then 1.0 else 0.0 end) as label +from + exploded t LEFT OUTER JOIN + model m ON (t.feature = m.feature) +group by + t.rowid; +``` + +# Evaluation + +```sql +create or replace view submit as +select + t.label as actual, + p.label as predicted, + p.prob as probability +from + a9a_test t + JOIN predict p on (t.rowid = p.rowid); + +select + sum(if(actual == predicted, 1, 0)) / count(1) as accuracy +from + submit; +``` + +> 0.8462625145875561 + +The following table shows accuracy for changing optimizer by `-loss logistic -opt XXXXXX -reg l1 -iter 30` option: + +| Optimizer | Accuracy | +|:--:|:--:| +| Default (Adagrad+RDA) | 0.8462625145875561 | +| SGD | 0.8462010932989374 | +| Momentum | 0.8254406977458387 | +| Nesterov | 0.8286346047540077 | +| AdaGrad | 0.850991953811191 | +| RMSprop | 0.8463239358761747 | +| RMSpropGraves | 0.825563540323076 | +| AdaDelta | 0.8492721577298692 | +| Adam | 0.8341625207296849 | +| Nadam | 0.8349609974817271 | +| Eve | 0.8348381549044899 | +| AdamHD | 0.8447269823720902 | + +> #### Note +> Optimizers using momentum need to tune decay rate well. +> Default (Adagrad+RDA), AdaDelta, Adam, and AdamHD is worth trying in my experience. + http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/a823e2b1/docs/gitbook/binaryclass/a9a_lr.md ---------------------------------------------------------------------- diff --git a/docs/gitbook/binaryclass/a9a_lr.md b/docs/gitbook/binaryclass/a9a_lr.md index 247d5a2..5a73a2a 100644 --- a/docs/gitbook/binaryclass/a9a_lr.md +++ b/docs/gitbook/binaryclass/a9a_lr.md @@ -17,6 +17,12 @@ under the License. --> +This pages shows an example of applying logistic regression for a9a binary classification task. + +> #### Caution +> +> `logloss()` became deprecated since v0.5.0 release. Use smarter [general classifier](./a9a_generic.md) instead. + <!-- toc --> # UDF preparation @@ -31,6 +37,7 @@ set hivevar:num_test_instances=16281; ``` # training + ```sql create table a9a_model1 as @@ -45,10 +52,13 @@ from ) t group by feature; ``` -_"-total_steps" option is optional for logress() function._ -_I recommend you NOT to use options (e.g., total_steps and eta0) if you are not familiar with those options. Hivemall then uses an autonomic ETA (learning rate) estimator._ + +> #### Note +> +> `-total_steps` option is optional for logress() function. We recommend you NOT to use options (e.g., `total_steps` and `eta0`) if you are not familiar with those options. Hivemall then uses an autonomic ETA (learning rate) estimator. # prediction + ```sql create or replace view a9a_predict1 as @@ -73,6 +83,7 @@ group by ``` # evaluation + ```sql create or replace view a9a_submit1 as select @@ -88,4 +99,5 @@ from select count(1) / ${num_test_instances} from a9a_submit1 where actual == predicted; ``` + > 0.8430071862907684 http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/a823e2b1/docs/gitbook/binaryclass/a9a_minibatch.md ---------------------------------------------------------------------- diff --git a/docs/gitbook/binaryclass/a9a_minibatch.md b/docs/gitbook/binaryclass/a9a_minibatch.md index 3fc5945..029b447 100644 --- a/docs/gitbook/binaryclass/a9a_minibatch.md +++ b/docs/gitbook/binaryclass/a9a_minibatch.md @@ -17,8 +17,13 @@ under the License. --> -This page explains how to apply [Mini-Batch Gradient Descent](https://class.coursera.org/ml-003/lecture/106) for the training of logistic regression explained in [this example](./a9a_lr.html). -So, refer [this page](./a9a_lr.html) first. This content depends on it. +This page explains how to apply [Mini-Batch Gradient Descent](https://class.coursera.org/ml-003/lecture/106) for the training of logistic regression explained in [this example](./a9a_lr.html). So, refer [this page](./a9a_lr.html) first. This content depends on it. + +> #### Caution +> +> `logloss()` became deprecated since v0.5.0 release. Use smarter [general classifier](./a9a_generic.md) instead. You can use `-mini_batch` option in general classifier as well. + +<!-- toc --> # Training http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/a823e2b1/docs/gitbook/binaryclass/general.md ---------------------------------------------------------------------- diff --git a/docs/gitbook/binaryclass/general.md b/docs/gitbook/binaryclass/general.md index a14130c..a436cb4 100644 --- a/docs/gitbook/binaryclass/general.md +++ b/docs/gitbook/binaryclass/general.md @@ -17,7 +17,7 @@ under the License. --> -Hivemall has a generic function for classification: `train_classifier`. Compared to the other functions we will see in the later chapters, `train_classifier` provides simpler and configureable generic interface which can be utilized to build binary classification models in a variety of settings. +Hivemall has a generic function for classification: `train_classifier`. Compared to the other functions we will see in the later chapters, `train_classifier` provides simpler and configurable generic interface which can be utilized to build binary classification models in a variety of settings. Here, we briefly introduce usage of the function. Before trying sample queries, you first need to prepare [a9a data](https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html#a9a). See [our a9a tutorial page](a9a_dataset.md) for further instructions. @@ -26,19 +26,6 @@ Here, we briefly introduce usage of the function. Before trying sample queries, > #### Note > This feature is supported from Hivemall v0.5-rc.1 or later. -# Preparation - -- Set `total_steps` ideally be `count(1) / {# of map tasks}`: - ``` - hive> select count(1) from a9a_train; - hive> set hivevar:total_steps=32561; - ``` -- Set `n_samples` to compute accuracy of prediction: - ``` - hive> select count(1) from a9a_test; - hive> set hivevar:n_samples=16281; - ``` - # Training ```sql @@ -49,17 +36,13 @@ select from ( select - train_classifier(add_bias(features), label, '-loss logloss -opt SGD -reg no -eta simple -total_steps ${total_steps}') as (feature, weight) + train_classifier(add_bias(features), label, '-loss logloss -opt SGD -reg no') as (feature, weight) from a9a_train ) t group by feature; ``` -> #### Note -> -> `-total_steps` option is an optional parameter and training works without it. - # Prediction & evaluation ```sql @@ -78,24 +61,32 @@ predict as ( sigmoid(sum(m.weight * t.value)) as prob, (case when sigmoid(sum(m.weight * t.value)) >= 0.5 then 1.0 else 0.0 end)as label from - test_exploded t LEFT OUTER JOIN - classification_model m ON (t.feature = m.feature) + test_exploded t + LEFT OUTER JOIN classification_model m + ON (t.feature = m.feature) group by t.rowid ), submit as ( select t.label as actual, - pd.label as predicted, - pd.prob as probability + p.label as predicted, + p.prob as probability from - a9a_test t JOIN predict pd - on (t.rowid = pd.rowid) + a9a_test t + JOIN predict p + on (t.rowid = p.rowid) ) -select count(1) / ${n_samples} from submit -where actual = predicted; +select + sum(if(actual = predicted, 1, 0)) / count(1) as accuracy +from + submit; ``` +|accuracy| +|:-:| +| 0.8461396720103188 | + # Comparison with the other binary classifiers In the next part of this user guide, our binary classification tutorials introduce many different functions: @@ -115,20 +106,20 @@ All of them actually have the same interface, but mathematical formulation and i In particular, the above sample queries are almost same as [a9a tutorial using Logistic Regression](a9a_lr.md). The difference is only in a choice of training function: `logress()` vs. `train_classifier()`. -However, at the same time, the options `-loss logloss -opt SGD -reg no -eta simple -total_steps ${total_steps}` for `train_classifier` indicates that Hivemall uses the generic classifier as Logistic Regressor (`logress`). Hence, the accuracy of prediction based on either `logress` and `train_classifier` should be same under the configuration. +However, at the same time, the options `-loss logloss -opt SGD -reg no` for `train_classifier` indicates that Hivemall uses the generic classifier as `logress`. Hence, the accuracy of prediction based on either `logress` and `train_classifier` would be (almost) same under the configuration. In addition, `train_classifier` supports the `-mini_batch` option in a similar manner to [what `logress` does](a9a_minibatch.md). Thus, following two training queries show the same results: ```sql select - logress(add_bias(features), label, '-total_steps ${total_steps} -mini_batch 10') as (feature, weight) + logress(add_bias(features), label, '-mini_batch 10') as (feature, weight) from a9a_train ``` ```sql select - train_classifier(add_bias(features), label, '-loss logloss -opt SGD -reg no -eta simple -total_steps ${total_steps} -mini_batch 10') as (feature, weight) + train_classifier(add_bias(features), label, '-loss logloss -opt SGD -reg no -mini_batch 10') as (feature, weight) from a9a_train ``` http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/a823e2b1/docs/gitbook/binaryclass/news20_adagrad.md ---------------------------------------------------------------------- diff --git a/docs/gitbook/binaryclass/news20_adagrad.md b/docs/gitbook/binaryclass/news20_adagrad.md index e3dfb47..063ecb0 100644 --- a/docs/gitbook/binaryclass/news20_adagrad.md +++ b/docs/gitbook/binaryclass/news20_adagrad.md @@ -19,25 +19,21 @@ <!-- toc --> -> #### Note -> This feature is supported since Hivemall `v0.3-beta2` or later. - -## UDF preparation - -``` -add jar ./tmp/hivemall-with-dependencies.jar; -source ./tmp/define-all.hive; - -use news20; -``` +> #### Caution +> +> `train_adagrad()` became deprecated since v0.5.0 release. Use smarter [general classifier](./a9a_generic.md) instead. -#[AdaGradRDA] +# AdaGradRDA > #### Note +> > The current AdaGradRDA implmenetation can only be applied to classification, > not to regression, because it uses hinge loss for the loss function. ## model building + ```sql +use news20; + drop table news20b_adagrad_rda_model1; create table news20b_adagrad_rda_model1 as select @@ -45,7 +41,7 @@ select voted_avg(weight) as weight from (select - train_adagrad_rda(addBias(features),label) as (feature,weight) + train_adagrad_rda(add_bias(features),label) as (feature,weight) from news20b_train_x3 ) t @@ -53,6 +49,7 @@ group by feature; ``` ## prediction + ```sql create or replace view news20b_adagrad_rda_predict1 as @@ -68,6 +65,7 @@ group by ``` ## evaluation + ```sql create or replace view news20b_adagrad_rda_submit1 as select @@ -82,15 +80,19 @@ from select count(1)/4996 from news20b_adagrad_rda_submit1 where actual == predicted; ``` + > SCW1 0.9661729383506805 > ADAGRAD+RDA 0.9677742193755005 -#[AdaGrad] +# AdaGrad -_Note that AdaGrad is better suited for a regression problem because the current implementation only support logistic loss._ +> #### Note +> +> AdaGrad is better suited for a binary classification problem because the current implementation only support logistic loss. ## model building + ```sql drop table news20b_adagrad_model1; create table news20b_adagrad_model1 as @@ -99,7 +101,7 @@ select voted_avg(weight) as weight from (select - adagrad(addBias(features),convert_label(label)) as (feature,weight) + train_adagrad_regr(add_bias(features),convert_label(label)) as (feature,weight) from news20b_train_x3 ) t @@ -110,6 +112,7 @@ group by feature; > `adagrad` takes 0/1 for a label value and `convert_label(label)` converts a > label value from -1/+1 to 0/1. ## prediction + ```sql create or replace view news20b_adagrad_predict1 as @@ -124,6 +127,7 @@ group by ``` ## evaluation + ```sql create or replace view news20b_adagrad_submit1 as select @@ -138,14 +142,16 @@ from select count(1)/4996 from news20b_adagrad_submit1 where actual == predicted; ``` + > 0.9549639711769415 (adagrad) -#[AdaDelta] +# AdaDelta > #### Caution > AdaDelta can only be applied for regression problem because the current > implementation only support logistic loss. ## model building + ```sql drop table news20b_adadelta_model1; create table news20b_adadelta_model1 as @@ -154,7 +160,7 @@ select voted_avg(weight) as weight from (select - adadelta(addBias(features),convert_label(label)) as (feature,weight) + adadelta(add_bias(features),convert_label(label)) as (feature,weight) from news20b_train_x3 ) t @@ -162,6 +168,7 @@ group by feature; ``` ## prediction + ```sql create or replace view news20b_adadelta_predict1 as @@ -176,6 +183,7 @@ group by ``` ## evaluation + ```sql create or replace view news20b_adadelta_submit1 as select @@ -187,7 +195,6 @@ from ``` - ```sql select count(1)/4996 from news20b_adadelta_submit1 where actual == predicted; http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/a823e2b1/docs/gitbook/binaryclass/news20_dataset.md ---------------------------------------------------------------------- diff --git a/docs/gitbook/binaryclass/news20_dataset.md b/docs/gitbook/binaryclass/news20_dataset.md index 2edd3f7..be48473 100644 --- a/docs/gitbook/binaryclass/news20_dataset.md +++ b/docs/gitbook/binaryclass/news20_dataset.md @@ -59,11 +59,6 @@ hadoop fs -copyFromLocal news20.test.t /dataset/news20-binary/test create database news20; use news20; -delete jar /home/myui/tmp/hivemall.jar; -add jar /home/myui/tmp/hivemall.jar; - -source /home/myui/tmp/define-all.hive; - Create external table news20b_train ( rowid int, label int, @@ -82,10 +77,10 @@ as select * from ( -select - amplify(3, *) as (rowid, label, features) -from - news20b_train + select + amplify(3, *) as (rowid, label, features) + from + news20b_train ) t CLUSTER BY rand(${seed}); @@ -93,11 +88,8 @@ create table news20b_test_exploded as select rowid, label, - cast(split(feature,":")[0] as int) as feature, - cast(split(feature,":")[1] as float) as value - -- hivemall v0.3.1 or later - -- extract_feature(feature) as feature, - -- extract_weight(feature) as value + extract_feature(feature) as feature, + extract_weight(feature) as value from news20b_test LATERAL VIEW explode(add_bias(features)) t AS feature; ``` http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/a823e2b1/docs/gitbook/binaryclass/news20_generic.md ---------------------------------------------------------------------- diff --git a/docs/gitbook/binaryclass/news20_generic.md b/docs/gitbook/binaryclass/news20_generic.md new file mode 100644 index 0000000..23a0363 --- /dev/null +++ b/docs/gitbook/binaryclass/news20_generic.md @@ -0,0 +1,83 @@ +<!-- + Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. +--> + +In this tutorial, we build a binary classification model using general classifier. + +<!-- toc --> + +## Training + +```sql +-- set mapred.reduce.tasks=3; -- explicitly use 3 reducers + +drop table news20b_generic_model; +create table news20b_generic_model as +select + feature, + voted_avg(weight) as weight +from + (select + train_classifier( + add_bias(features), label, + '-loss logistic -opt AdamHD -reg l1 -iters 20' + ) as (feature,weight) + from + news20b_train_x3 + ) t +group by feature; +``` + +> #### Note +> Default (Adagrad+RDA), AdaDelta, Adam, and AdamHD is worth trying in my experience. + +## prediction + +```sql +create or replace view news20b_generic_predict +as +select + t.rowid, + sum(m.weight * t.value) as total_weight, + case when sum(m.weight * t.value) > 0.0 then 1 else -1 end as label +from + news20b_test_exploded t LEFT OUTER JOIN + news20b_generic_model m ON (t.feature = m.feature) +group by + t.rowid; +``` + +## evaluation + +```sql +WITH submit as ( +select + t.label as actual, + p.label as predicted +from + news20b_test t + JOIN news20b_generic_predict p + on (t.rowid = p.rowid) +) +select + sum(if(actual = predicted, 1, 0)) / count(1) as accuracy +from + submit; +``` + +> 0.967173738991193 (`-opt AdamHD -reg l1 `) \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/a823e2b1/docs/gitbook/binaryclass/news20_pa.md ---------------------------------------------------------------------- diff --git a/docs/gitbook/binaryclass/news20_pa.md b/docs/gitbook/binaryclass/news20_pa.md index d40b433..12e459a 100644 --- a/docs/gitbook/binaryclass/news20_pa.md +++ b/docs/gitbook/binaryclass/news20_pa.md @@ -17,15 +17,6 @@ under the License. --> -## UDF preparation -``` -delete jar /home/myui/tmp/hivemall.jar; -add jar /home/myui/tmp/hivemall.jar; - -source /home/myui/tmp/define-all.hive; -``` - ---- #[Perceptron] ## model building @@ -37,7 +28,7 @@ select voted_avg(weight) as weight from (select - perceptron(add_bias(features),label) as (feature,weight) + train_perceptron(add_bias(features),label) as (feature,weight) from news20b_train_x3 ) t @@ -64,27 +55,20 @@ group by create or replace view news20b_perceptron_submit1 as select t.label as actual, - pd.label as predicted + p.label as predicted from - news20b_test t JOIN news20b_perceptron_predict1 pd - on (t.rowid = pd.rowid); + news20b_test t JOIN news20b_perceptron_predict1 p + on (t.rowid = p.rowid); ``` ```sql -select count(1)/4996 from news20b_perceptron_submit1 -where actual == predicted; +select + sum(if(actual = predicted, 1, 0)) / count(1) as accuracy +from + news20b_perceptron_submit1; ``` > 0.9459567654123299 -## Cleaning - -```sql -drop table news20b_perceptron_model1; -drop view news20b_perceptron_predict1; -drop view news20b_perceptron_submit1; -``` - ---- #[Passive Aggressive] ## model building @@ -130,20 +114,13 @@ from ``` ```sql -select count(1)/4996 from news20b_pa_submit1 -where actual == predicted; +select + sum(if(actual = predicted, 1, 0)) / count(1) as accuracy +from + news20b_pa_submit1; ``` > 0.9603682946357086 -## Cleaning - -```sql -drop table news20b_pa_model1; -drop view news20b_pa_predict1; -drop view news20b_pa_submit1; -``` - ---- #[Passive Aggressive (PA1)] ## model building @@ -171,8 +148,9 @@ select sum(m.weight * t.value) as total_weight, case when sum(m.weight * t.value) > 0.0 then 1 else -1 end as label from - news20b_test_exploded t LEFT OUTER JOIN - news20b_pa1_model1 m ON (t.feature = m.feature) + news20b_test_exploded t + LEFT OUTER JOIN news20b_pa1_model1 m + ON (t.feature = m.feature) group by t.rowid; ``` @@ -182,27 +160,21 @@ group by create or replace view news20b_pa1_submit1 as select t.label as actual, - pd.label as predicted + p.label as predicted from - news20b_test t JOIN news20b_pa1_predict1 pd - on (t.rowid = pd.rowid); + news20b_test t + JOIN news20b_pa1_predict1 p + on (t.rowid = p.rowid); ``` ```sql -select count(1)/4996 from news20b_pa1_submit1 -where actual == predicted; +select + sum(if(actual = predicted, 1, 0)) / count(1) as accuracy +from + news20b_pa1_submit1; ``` > 0.9601681345076061 -## Cleaning - -```sql -drop table news20b_pa1_model1; -drop view news20b_pa1_predict1; -drop view news20b_pa1_submit1; -``` - ---- #[Passive Aggressive (PA2)] ## model building @@ -248,15 +220,10 @@ from ``` ```sql -select count(1)/4996 from news20b_pa2_submit1 -where actual == predicted; +select + sum(if(actual = predicted, 1, 0)) / count(1) as accuracy +from + news20b_pa2_submit1; ``` > 0.9597678142514011 -## Cleaning - -```sql -drop table news20b_pa2_model1; -drop view news20b_pa2_predict1; -drop view news20b_pa2_submit1; -``` http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/a823e2b1/docs/gitbook/binaryclass/news20_rf.md ---------------------------------------------------------------------- diff --git a/docs/gitbook/binaryclass/news20_rf.md b/docs/gitbook/binaryclass/news20_rf.md index 659536a..9a0d1f8 100644 --- a/docs/gitbook/binaryclass/news20_rf.md +++ b/docs/gitbook/binaryclass/news20_rf.md @@ -17,11 +17,11 @@ under the License. --> -Hivemall Random Forest supports libsvm-like sparse inputs. +Hivemall Random Forest supports libsvm-like sparse inputs. This page shows a classification example on 20-newsgroup dataset. > #### Note -> This feature, i.e., Sparse input support in Random Forest, is supported since Hivemall v0.5.0 or later._ -> [`feature_hashing`](https://hivemall.incubator.apache.org/userguide/ft_engineering/hashing.html#featurehashing-function) function is useful to prepare feature vectors for Random Forest. +> This feature, i.e., Sparse input support in Random Forest, is supported since Hivemall v0.5.0 or later. +> [`feature_hashing`](http://hivemall.incubator.apache.org/userguide/ft_engineering/hashing.html#featurehashing-function) function is useful to prepare feature vectors for Random Forest. <!-- toc --> http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/a823e2b1/docs/gitbook/binaryclass/news20_scw.md ---------------------------------------------------------------------- diff --git a/docs/gitbook/binaryclass/news20_scw.md b/docs/gitbook/binaryclass/news20_scw.md index f364c12..340ca8f 100644 --- a/docs/gitbook/binaryclass/news20_scw.md +++ b/docs/gitbook/binaryclass/news20_scw.md @@ -17,16 +17,6 @@ under the License. --> -## UDF preparation -``` -use news20; - -delete jar /home/myui/tmp/hivemall.jar; -add jar /home/myui/tmp/hivemall.jar; -source /home/myui/tmp/define-all.hive; -``` - ---- # Confidece Weighted (CW) ## training @@ -64,32 +54,21 @@ group by ## evaluation ```sql -create or replace view news20b_cw_submit1 -as +WITH submit as ( select t.rowid, t.label as actual, - pd.label as predicted + p.label as predicted from - news20b_test t JOIN news20b_cw_predict1 pd - on (t.rowid = pd.rowid); -``` - -```sql -select count(1)/4996 from news20b_cw_submit1 -where actual = predicted; + news20b_test t + JOIN news20b_cw_predict1 p + on (t.rowid = p.rowid) +) +select sum(if(actual = predicted, 1, 0)) / count(1) as accuracy +from submit; ``` > 0.9655724579663731 -## Cleaning - -```sql -drop table news20b_cw_model1; -drop view news20b_cw_predict1; -drop view news20b_cw_submit1; -``` - ---- # Adaptive Regularization of Weight Vectors (AROW) ## training @@ -127,31 +106,21 @@ group by ## evaluation ```sql -create or replace view news20b_arow_submit1 as -select +WITH submit as ( +select t.rowid, t.label as actual, - pd.label as predicted + p.label as predicted from - news20b_test t JOIN news20b_arow_predict1 pd - on (t.rowid = pd.rowid); -``` - -```sql -select count(1)/4996 from news20b_arow_submit1 -where actual = predicted; + news20b_test t + JOIN news20b_arow_predict1 p + on (t.rowid = p.rowid) +) +select sum(if(actual = predicted, 1, 0)) / count(1) as accuracy +from submit; ``` > 0.9659727782225781 -## Cleaning - -```sql -drop table news20b_arow_model1; -drop view news20b_arow_predict1; -drop view news20b_arow_submit1; -``` - ---- # Soft Confidence-Weighted (SCW1) ## training @@ -189,31 +158,20 @@ group by ## evaluation ```sql -create or replace view news20b_scw_submit1 as -select - t.rowid, - t.label as actual, - pd.label as predicted -from - news20b_test t JOIN news20b_scw_predict1 pd - on (t.rowid = pd.rowid); -``` - -```sql -select count(1)/4996 from news20b_scw_submit1 -where actual = predicted; +WITH submit as ( + select + t.rowid, + t.label as actual, + p.label as predicted + from + news20b_test t JOIN news20b_scw_predict1 p + on (t.rowid = p.rowid) +) +select sum(if(actual = predicted, 1, 0)) / count(1) as accuracy +from submit ``` > 0.9661729383506805 -## Cleaning - -```sql -drop table news20b_scw_model1; -drop view news20b_scw_predict1; -drop view news20b_scw_submit1; -``` - ---- # Soft Confidence-Weighted (SCW2) ## training @@ -251,30 +209,21 @@ group by ## evaluation ```sql -create or replace view news20b_scw2_submit1 as +WITH submit as ( select t.rowid, t.label as actual, pd.label as predicted from - news20b_test t JOIN news20b_scw2_predict1 pd - on (t.rowid = pd.rowid); -``` - -```sql -select count(1)/4996 from news20b_scw2_submit1 -where actual = predicted; + news20b_test t + JOIN news20b_scw2_predict1 pd + on (t.rowid = pd.rowid) +) +select sum(if(actual = predicted, 1, 0)) / count(1) as accuracy +from submit; ``` > 0.9579663730984788 -## Cleaning - -```sql -drop table news20b_scw2_model1; -drop view news20b_scw2_predict1; -drop view news20b_scw2_submit1; -``` - -- | Algorithm | Accuracy | http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/a823e2b1/docs/gitbook/regression/e2006_arow.md ---------------------------------------------------------------------- diff --git a/docs/gitbook/regression/e2006_arow.md b/docs/gitbook/regression/e2006_arow.md index 169a7dc..1342f8f 100644 --- a/docs/gitbook/regression/e2006_arow.md +++ b/docs/gitbook/regression/e2006_arow.md @@ -16,15 +16,15 @@ specific language governing permissions and limitations under the License. --> - -https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/regression.html#E2006-tfidf ---- -#[PA1a] +<!-- toc --> -##Training +# PA1a + +## Training ```sql set mapred.reduce.tasks=64; + drop table e2006tfidf_pa1a_model ; create table e2006tfidf_pa1a_model as select @@ -37,9 +37,13 @@ from e2006tfidf_train_x3 ) t group by feature; + +-- reset to the default setting set mapred.reduce.tasks=-1; ``` -_Caution: Do not use voted_avg() for regression. voted_avg() is for classification._ + +> #### Caution +> Do not use `voted_avg()` for regression. `voted_avg()` is for classification. ## prediction ```sql @@ -57,36 +61,31 @@ group by ## evaluation ```sql -drop table e2006tfidf_pa1a_submit; -create table e2006tfidf_pa1a_submit as +WITH submit as ( + select + t.target as actual, + p.predicted as predicted + from + e2006tfidf_test t + JOIN e2006tfidf_pa1a_predict p + on (t.rowid = p.rowid) +) select - t.target as actual, - p.predicted as predicted + rmse(predicted, actual) as RMSE, + mse(predicted, actual) as MSE, + mae(predicted, actual) as MAE, + r2(predicted, actual) as R2 from - e2006tfidf_test t JOIN e2006tfidf_pa1a_predict p - on (t.rowid = p.rowid); - -select avg(actual), avg(predicted) from e2006tfidf_pa1a_submit; + submit; ``` -> -3.8200363760415414 -3.8869923258589476 - -```sql -set hivevar:mean_actual=-3.8200363760415414; -select - sqrt(sum(pow(predicted - actual,2.0))/count(1)) as RMSE, - sum(pow(predicted - actual,2.0))/count(1) as MSE, - sum(abs(predicted - actual))/count(1) as MAE, - 1 - sum(pow(actual - predicted,2.0)) / sum(pow(actual - ${mean_actual},2.0)) as R2 -from - e2006tfidf_pa1a_submit; -``` -> 0.3797959864675519 0.14424499133686086 0.23846059576113587 0.5010367946980386 +| rmse | mse | mae | r2 | +|:-:|:-:|:-:|:-:| +| 0.3797959864675519 | 0.14424499133686086 | 0.23846059576113587 |0.5010367946980386 | ---- -#[PA2a] +# PA2a -##Training +## Training ```sql set mapred.reduce.tasks=64; drop table e2006tfidf_pa2a_model; @@ -120,36 +119,31 @@ group by ## evaluation ```sql -drop table e2006tfidf_pa2a_submit; -create table e2006tfidf_pa2a_submit as +WITH submit as ( + select + t.target as actual, + p.predicted as predicted + from + e2006tfidf_test t + JOIN e2006tfidf_pa2a_predict p + on (t.rowid = p.rowid) +) select - t.target as actual, - pd.predicted as predicted + rmse(predicted, actual) as RMSE, + mse(predicted, actual) as MSE, + mae(predicted, actual) as MAE, + r2(predicted, actual) as R2 from - e2006tfidf_test t JOIN e2006tfidf_pa2a_predict pd - on (t.rowid = pd.rowid); - -select avg(actual), avg(predicted) from e2006tfidf_pa2a_submit; + submit; ``` -> -3.8200363760415414 -3.9124877451612488 - -```sql -set hivevar:mean_actual=-3.8200363760415414; -select - sqrt(sum(pow(predicted - actual,2.0))/count(1)) as RMSE, - sum(pow(predicted - actual,2.0))/count(1) as MSE, - sum(abs(predicted - actual))/count(1) as MAE, - 1 - sum(pow(actual - predicted,2.0)) / sum(pow(actual - ${mean_actual},2.0)) as R2 -from - e2006tfidf_pa2a_submit; -``` -> 0.38538660838804495 0.14852283792484033 0.2466732002711477 0.48623913673053565 +| rmse | mse | mae | r2 | +|:-:|:-:|:-:|:-:| +| 0.38538660838804495 | 0.14852283792484033 | 0.2466732002711477 |0.48623913673053565 | ---- -#[AROW] +# AROW -##Training +## Training ```sql set mapred.reduce.tasks=64; drop table e2006tfidf_arow_model ; @@ -185,37 +179,32 @@ group by ## evaluation ```sql -drop table e2006tfidf_arow_submit; -create table e2006tfidf_arow_submit as +WITH submit as ( + select + t.target as actual, + p.predicted as predicted + from + e2006tfidf_test t + JOIN e2006tfidf_arow_predict p + on (t.rowid = p.rowid) +) select - t.target as actual, - p.predicted as predicted + rmse(predicted, actual) as RMSE, + mse(predicted, actual) as MSE, + mae(predicted, actual) as MAE, + r2(predicted, actual) as R2 from - e2006tfidf_test t JOIN e2006tfidf_arow_predict p - on (t.rowid = p.rowid); - -select avg(actual), avg(predicted) from e2006tfidf_arow_submit; + submit; ``` -> -3.8200363760415414 -3.8692518911517433 -```sql -set hivevar:mean_actual=-3.8200363760415414; - -select - sqrt(sum(pow(predicted - actual,2.0))/count(1)) as RMSE, - sum(pow(predicted - actual,2.0))/count(1) as MSE, - sum(abs(predicted - actual))/count(1) as MAE, - 1 - sum(pow(actual - predicted,2.0)) / sum(pow(actual - ${mean_actual},2.0)) as R2 -from - e2006tfidf_arow_submit; -``` -> 0.37862513029019407 0.14335698928726642 0.2368787001269389 0.5041085155590119 +| rmse | mse | mae | r2 | +|:-:|:-:|:-:|:-:| +| 0.37862513029019407 | 0.14335698928726642 | 0.2368787001269389 | 0.5041085155590119 | ---- -#[AROWe] +# AROWe AROWe is a modified version of AROW that uses Hinge loss (epsilion = 0.1) -##Training +## Training ```sql set mapred.reduce.tasks=64; drop table e2006tfidf_arowe_model ; @@ -251,28 +240,24 @@ group by ## evaluation ```sql -drop table e2006tfidf_arowe_submit; -create table e2006tfidf_arowe_submit as +WITH submit as ( + select + t.target as actual, + p.predicted as predicted + from + e2006tfidf_test t + JOIN e2006tfidf_arowe_predict p + on (t.rowid = p.rowid) +) select - t.target as actual, - p.predicted as predicted + rmse(predicted, actual) as RMSE, + mse(predicted, actual) as MSE, + mae(predicted, actual) as MAE, + r2(predicted, actual) as R2 from - e2006tfidf_test t JOIN e2006tfidf_arowe_predict p - on (t.rowid = p.rowid); - -select avg(actual), avg(predicted) from e2006tfidf_arowe_submit; + submit; ``` -> -3.8200363760415414 -3.86494905688414 - -```sql -set hivevar:mean_actual=-3.8200363760415414; -select - sqrt(sum(pow(predicted - actual,2.0))/count(1)) as RMSE, - sum(pow(predicted - actual,2.0))/count(1) as MSE, - sum(abs(predicted - actual))/count(1) as MAE, - 1 - sum(pow(actual - predicted,2.0)) / sum(pow(actual - ${mean_actual},2.0)) as R2 -from - e2006tfidf_arowe_submit; -``` -> 0.37789148212861856 0.14280197226536404 0.2357339155291536 0.5060283955470721 +| rmse | mse | mae | r2 | +|:-:|:-:|:-:|:-:| +| 0.37789148212861856 | 0.14280197226536404 | 0.2357339155291536 |0.5060283955470721 | http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/a823e2b1/docs/gitbook/regression/e2006_generic.md ---------------------------------------------------------------------- diff --git a/docs/gitbook/regression/e2006_generic.md b/docs/gitbook/regression/e2006_generic.md new file mode 100644 index 0000000..11e280c --- /dev/null +++ b/docs/gitbook/regression/e2006_generic.md @@ -0,0 +1,90 @@ +<!-- + Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. +--> + +This tutorial shows how to apply General Regressor for a regression problem of e2006 dataset. + +<!-- toc --> + +## Training +```sql +set mapred.reduce.tasks=32; + +drop table e2006tfidf_generic_model; +create table e2006tfidf_generic_model as +select + feature, + avg(weight) as weight +from + (select + train_regressor( + add_bias(features), target, + '-loss squaredloss -opt AdamHD -reg No -iters 20' + ) as (feature, weight) + from + e2006tfidf_train_x3 + ) t +group by feature; + +-- reset to the default setting +set mapred.reduce.tasks=-1; +``` + +> #### Caution +> Regularization could not work well for regression problem. Then, try providing `-reg No` option as seen in the above query. +> Also, do not use `voted_avg()` for regression. `voted_avg()` is for classification. + +## prediction +```sql +create or replace view e2006tfidf_generic_predict +as +select + t.rowid, + sum(m.weight * t.value) as predicted +from + e2006tfidf_test_exploded t LEFT OUTER JOIN + e2006tfidf_generic_model m ON (t.feature = m.feature) +group by + t.rowid; +``` + +## evaluation +```sql +WITH submit as ( + select + t.target as actual, + p.predicted as predicted + from + e2006tfidf_test t + JOIN e2006tfidf_generic_predict p + on (t.rowid = p.rowid) +) +select + rmse(predicted, actual) as RMSE, + mse(predicted, actual) as MSE, + mae(predicted, actual) as MAE, + r2(predicted, actual) as R2 +from + submit; +``` + +| rmse | mse | mae | r2 | +|:-:|:-:|:-:|:-:| +| 0.37125069279938866 | 0.13782707690402607 | 0.2270351090214029 | 0.5232372408076887 | + + http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/a823e2b1/docs/gitbook/supervised_learning/prediction.md ---------------------------------------------------------------------- diff --git a/docs/gitbook/supervised_learning/prediction.md b/docs/gitbook/supervised_learning/prediction.md index 65aad27..e3d26c8 100644 --- a/docs/gitbook/supervised_learning/prediction.md +++ b/docs/gitbook/supervised_learning/prediction.md @@ -121,7 +121,7 @@ Below we list possible options for `train_regressor` and `train_classifier`, and - SquaredLoss (synonym: squared) - QuantileLoss (synonym: quantile) - EpsilonInsensitiveLoss (synonym: epsilon_insensitive) - - SquaredEpsilonInsensitiveLoss (synonym: squared_epsilon_insensitive) + - SquaredEpsilonInsensitiveLoss (synonym: squared\_epsilon_insensitive) - HuberLoss (synonym: huber) - Regularization function: `-reg`, `-regularization` @@ -134,9 +134,74 @@ Additionally, there are several variants of the SGD technique, and it is also co - Optimizer: `-opt`, `-optimizer` - SGD - - AdaGrad + - Momentum + - Hyperparameters + - `-alpha 1.0` Learning rate. + - `-momentum 0.9` Exponential decay rate of the first order moment. + - Nesterov + - See: [https://arxiv.org/abs/1212.0901](https://arxiv.org/abs/1212.0901) + - Hyperparameters + - same as Momentum + - AdaGrad (default) + - See: [http://jmlr.org/papers/v12/duchi11a.html](http://jmlr.org/papers/v12/duchi11a.html) + - Hyperparameters + - `-eps 1.0` Constant for the numerical stability. + - RMSprop + - Description: RMSprop optimizer introducing weight decay to AdaGrad. + - See: [http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf](http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf) + - Hyperparameters + - `-decay 0.95` Weight decay rate + - `-eps 1.0` Constant for numerical stability + - RMSpropGraves + - Description: Alex Graves's RMSprop introducing weight decay and momentum. + - See: [https://arxiv.org/abs/1308.0850](https://arxiv.org/abs/1308.0850) + - Hyperparameters + - `-alpha 1.0` Learning rate. + - `-decay 0.95` Weight decay rate + - `-momentum 0.9` Exponential decay rate of the first order moment. + - `-eps 1.0` Constant for numerical stability - AdaDelta + - See: [https://arxiv.org/abs/1212.5701](https://arxiv.org/abs/1212.5701) + - Hyperparameters + - `-decay 0.95` Weight decay rate + - `-eps 1e-6f` Constant for numerical stability - Adam + - See: + - [Adam: A Method for Stochastic Optimization](https://arxiv.org/abs/1412.6980v8) + - [Fixing Weight Decay Regularization in Adam](https://openreview.net/forum?id=rk6qdGgCZ) + - [On the Convergence of Adam and Beyond](https://openreview.net/forum?id=ryQu7f-RZ) + - Hyperparameters + - `-alpha 1.0` Learning rate. + - `-beta1 0.9` Exponential decay rate of the first order moment. + - `-beta2 0.999` Exponential decay rate of the second order moment. + - `-eps 1e-8f` Constant for numerical stability + - `-decay 0.0` Weight decay rate + - Nadam + - Description: Nadam is Adam optimizer with Nesterov momentum. + - See: + - [Incorporating Nesterov Momentum into Adam](https://openreview.net/pdf?id=OM0jvwB8jIp57ZJjtNEZ) + - [Adam report](http://cs229.stanford.edu/proj2015/054_report.pdf) + - [On the importance of initialization and momentum in deep learning](http://www.cs.toronto.edu/~fritz/absps/momentum.pdf) + - Hyperparameters + - same as Adam except ... + - `-scheduleDecay 0.004` Scheduled decay rate (for each 250 steps by the default; 1/250=0.004) + - Eve + - See: [https://openreview.net/forum?id=r1WUqIceg](https://openreview.net/forum?id=r1WUqIceg) + - Hyperparameters + - same as Adam except ... + - `-beta3 0.999` Decay rate for Eve coefficient. + - `-c 10` Constant used for gradient clipping `clip(val, 1/c, c)` + - AdamHD + - Description: Adam optimizer with Hypergradient Descent. Learning rate `-alpha` is automatically tuned. + - See: + - [Online Learning Rate Adaptation with Hypergradient Descent](https://openreview.net/forum?id=BkrsAzWAb) + - [Convergence Analysis of an Adaptive Method of Gradient Descent](https://damaru2.github.io/convergence_analysis_hypergradient_descent/dissertation_hypergradients.pdf) + - Hyperparameters + - same as Adam except ... + - `-alpha 0.02` Learning rate. + - `-beta -1e-6` Constant used for tuning learning rate. + +Default (Adagrad+RDA), AdaDelta, Adam, and AdamHD is worth trying in my experience. > #### Note > @@ -156,8 +221,13 @@ Furthermore, optimizer offers to set auxiliary options such as: For details of available options, following queries might be helpful to list all of them: ```sql -select train_regressor(array(), 0, '-help'); -select train_classifier(array(), 0, '-help'); +select train_regressor('-help'); +-- v0.5.0 or before +-- select train_regressor(array(), 0, '-help'); + +select train_classifier('-help'); +-- v0.5.0 or before +-- select train_classifier(array(), 0, '-help'); ``` In practice, you can try different combinations of the options in order to achieve higher prediction accuracy. http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/a823e2b1/docs/gitbook/supervised_learning/tutorial.md ---------------------------------------------------------------------- diff --git a/docs/gitbook/supervised_learning/tutorial.md b/docs/gitbook/supervised_learning/tutorial.md index 5f96a2a..11dc1c8 100644 --- a/docs/gitbook/supervised_learning/tutorial.md +++ b/docs/gitbook/supervised_learning/tutorial.md @@ -37,15 +37,6 @@ FROM ; ``` - -Hivemall function [`hivemall_version()`](../misc/funcs.html#others) shows current Hivemall version, for example: - -```sql -select hivemall_version(); -``` - -> "0.5.1-incubating-SNAPSHOT" - Below we list ML and relevant problems that Hivemall can solve: - [Binary and multi-class classification](../binaryclass/general.html) @@ -199,7 +190,9 @@ Notice that weight is learned for each possible value in a categorical feature, Of course, you can optimize hyper-parameters to build more accurate prediction model. Check the output of the following query to see all available options, including learning rate, number of iterations and regularization parameters, and their default values: ```sql -select train_classifier(array(), 0, '-help'); +select train_classifier('-help'); +-- Hivemall 0.5.2 and before +-- select train_classifier(array(), 0, '-help'); ``` ### Step 3. Prediction @@ -230,14 +223,17 @@ with features_exploded as ( -- to join with a model table extract_feature(fv) as feature, extract_weight(fv) as value - from unforeseen_samples t1 LATERAL VIEW explode(features) t2 as fv + from + unforeseen_samples t1 + LATERAL VIEW explode(features) t2 as fv ) select t1.id, sigmoid( sum(p1.weight * t1.value) ) as probability from features_exploded t1 - LEFT OUTER JOIN classifier p1 ON (t1.feature = p1.feature) + LEFT OUTER JOIN classifier p1 + ON (t1.feature = p1.feature) group by t1.id ; @@ -265,7 +261,9 @@ with features_exploded as ( id, extract_feature(fv) as feature, extract_weight(fv) as value - from training t1 LATERAL VIEW explode(features) t2 as fv + from + training t1 + LATERAL VIEW explode(features) t2 as fv ), predictions as ( select @@ -273,7 +271,8 @@ predictions as ( sigmoid( sum(p1.weight * t1.value) ) as probability from features_exploded t1 - LEFT OUTER JOIN classifier p1 ON (t1.feature = p1.feature) + LEFT OUTER JOIN classifier p1 + ON (t1.feature = p1.feature) group by t1.id ) @@ -281,10 +280,13 @@ select auc(probability, label) as auc, logloss(probability, label) as logloss from ( - select t1.probability, t2.label - from predictions t1 - join training t2 on (t1.id = t2.id) - ORDER BY probability DESC + select + t1.probability, t2.label + from + predictions t1 + join training t2 on (t1.id = t2.id) + ORDER BY + probability DESC ) t ; ``` @@ -371,7 +373,9 @@ from Run the function with `-help` option to list available options: ```sql -select train_regressor(array(), 0, '-help'); +select train_regressor('-help'); +-- Hivemall 0.5.2 and before +-- select train_regressor(array(), 0, '-help'); ``` ### Step 3. Prediction @@ -411,7 +415,7 @@ group by Output is like: -|id| predicted_num_purchases| +|id| predicted\_num_purchases| |---:|---:| | 1| 3.645142912864685| @@ -425,7 +429,9 @@ with features_exploded as ( id, extract_feature(fv) as feature, extract_weight(fv) as value - from training t1 LATERAL VIEW explode(features) t2 as fv + from + training t1 + LATERAL VIEW explode(features) t2 as fv ), predictions as ( select
