Repository: incubator-hivemall
Updated Branches:
  refs/heads/master 31932fd7c -> a823e2b17


[HIVEMALL-214][DOC] Update userguide for General Classifier/Regressor example

## What changes were proposed in this pull request?

Refine user guide for generic classifier/regressor and so on.

## What type of PR is it?

Documentation

## What is the Jira issue?

https://issues.apache.org/jira/browse/HIVEMALL-214

## How to use this feature?

See user guide.

Author: Makoto Yui <[email protected]>

Closes #159 from myui/HIVEMALL-214.


Project: http://git-wip-us.apache.org/repos/asf/incubator-hivemall/repo
Commit: 
http://git-wip-us.apache.org/repos/asf/incubator-hivemall/commit/a823e2b1
Tree: http://git-wip-us.apache.org/repos/asf/incubator-hivemall/tree/a823e2b1
Diff: http://git-wip-us.apache.org/repos/asf/incubator-hivemall/diff/a823e2b1

Branch: refs/heads/master
Commit: a823e2b17536dbf4418286fa508c93e912a75e8d
Parents: 31932fd
Author: Makoto Yui <[email protected]>
Authored: Wed Dec 26 19:15:43 2018 +0900
Committer: Makoto Yui <[email protected]>
Committed: Wed Dec 26 19:15:43 2018 +0900

----------------------------------------------------------------------
 docs/gitbook/SUMMARY.md                        |  57 +++---
 docs/gitbook/binaryclass/a9a_generic.md        | 109 ++++++++++++
 docs/gitbook/binaryclass/a9a_lr.md             |  16 +-
 docs/gitbook/binaryclass/a9a_minibatch.md      |   9 +-
 docs/gitbook/binaryclass/general.md            |  51 +++---
 docs/gitbook/binaryclass/news20_adagrad.md     |  45 +++--
 docs/gitbook/binaryclass/news20_dataset.md     |  20 +--
 docs/gitbook/binaryclass/news20_generic.md     |  83 +++++++++
 docs/gitbook/binaryclass/news20_pa.md          |  87 +++-------
 docs/gitbook/binaryclass/news20_rf.md          |   6 +-
 docs/gitbook/binaryclass/news20_scw.md         | 121 ++++---------
 docs/gitbook/regression/e2006_arow.md          | 183 +++++++++-----------
 docs/gitbook/regression/e2006_generic.md       |  90 ++++++++++
 docs/gitbook/supervised_learning/prediction.md |  78 ++++++++-
 docs/gitbook/supervised_learning/tutorial.md   |  48 ++---
 15 files changed, 636 insertions(+), 367 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/a823e2b1/docs/gitbook/SUMMARY.md
----------------------------------------------------------------------
diff --git a/docs/gitbook/SUMMARY.md b/docs/gitbook/SUMMARY.md
index 31a0311..7ead819 100644
--- a/docs/gitbook/SUMMARY.md
+++ b/docs/gitbook/SUMMARY.md
@@ -89,44 +89,46 @@
 * [Binary Classification](binaryclass/general.md)
 
 * [a9a Tutorial](binaryclass/a9a.md)
-    * [Data preparation](binaryclass/a9a_dataset.md)
+    * [Data Preparation](binaryclass/a9a_dataset.md)
+    * [General Binary Classifier](binaryclass/a9a_generic.md)
     * [Logistic Regression](binaryclass/a9a_lr.md)
-    * [Mini-batch gradient descent](binaryclass/a9a_minibatch.md)
+    * [Mini-batch Gradient Descent](binaryclass/a9a_minibatch.md)
 
 * [News20 Tutorial](binaryclass/news20.md)
-    * [Data preparation](binaryclass/news20_dataset.md)
+    * [Data Preparation](binaryclass/news20_dataset.md)
     * [Perceptron, Passive Aggressive](binaryclass/news20_pa.md)
     * [CW, AROW, SCW](binaryclass/news20_scw.md)
+    * [General Binary Classifier](binaryclass/news20_generic.md)
     * [AdaGradRDA, AdaGrad, AdaDelta](binaryclass/news20_adagrad.md)
     * [Random Forest](binaryclass/news20_rf.md)
 
 * [KDD2010a Tutorial](binaryclass/kdd2010a.md)
-    * [Data preparation](binaryclass/kdd2010a_dataset.md)
+    * [Data Preparation](binaryclass/kdd2010a_dataset.md)
     * [PA, CW, AROW, SCW](binaryclass/kdd2010a_scw.md)
 
 * [KDD2010b Tutorial](binaryclass/kdd2010b.md)
-    * [Data preparation](binaryclass/kdd2010b_dataset.md)
+    * [Data Preparation](binaryclass/kdd2010b_dataset.md)
     * [AROW](binaryclass/kdd2010b_arow.md)
 
 * [Webspam Tutorial](binaryclass/webspam.md)
-    * [Data pareparation](binaryclass/webspam_dataset.md)
+    * [Data Pareparation](binaryclass/webspam_dataset.md)
     * [PA1, AROW, SCW](binaryclass/webspam_scw.md)
 
 * [Kaggle Titanic Tutorial](binaryclass/titanic_rf.md)
 
 * [Criteo Tutorial](binaryclass/criteo.md)
-    * [Data preparation](binaryclass/criteo_dataset.md)
+    * [Data Preparation](binaryclass/criteo_dataset.md)
     * [Field-Aware Factorization Machines](binaryclass/criteo_ffm.md)
 
 ## Part VII - Multiclass Classification
 
 * [News20 Multiclass Tutorial](multiclass/news20.md)
-    * [Data preparation](multiclass/news20_dataset.md)
-    * [Data preparation for one-vs-the-rest 
classifiers](multiclass/news20_one-vs-the-rest_dataset.md)
+    * [Data Preparation](multiclass/news20_dataset.md)
+    * [Data Preparation for one-vs-the-rest 
classifiers](multiclass/news20_one-vs-the-rest_dataset.md)
     * [PA](multiclass/news20_pa.md)
     * [CW, AROW, SCW](multiclass/news20_scw.md)
     * [Ensemble learning](multiclass/news20_ensemble.md)
-    * [one-vs-the-rest classifier](multiclass/news20_one-vs-the-rest.md)
+    * [one-vs-the-rest Classifier](multiclass/news20_one-vs-the-rest.md)
 
 * [Iris Tutorial](multiclass/iris.md)
     * [Data preparation](multiclass/iris_dataset.md)
@@ -138,11 +140,12 @@
 * [Regression](regression/general.md)
 
 * [E2006-tfidf Regression Tutorial](regression/e2006.md)
-    * [Data preparation](regression/e2006_dataset.md)
+    * [Data Preparation](regression/e2006_dataset.md)
+    * [General Regessor](regression/e2006_generic.md)
     * [Passive Aggressive, AROW](regression/e2006_arow.md)
 
 * [KDDCup 2012 Track 2 CTR Prediction Tutorial](regression/kddcup12tr2.md)
-    * [Data preparation](regression/kddcup12tr2_dataset.md)
+    * [Data Preparation](regression/kddcup12tr2_dataset.md)
     * [Logistic Regression, Passive Aggressive](regression/kddcup12tr2_lr.md)
     * [Logistic Regression with 
amplifier](regression/kddcup12tr2_lr_amplify.md)
     * [AdaGrad, AdaDelta](regression/kddcup12tr2_adagrad.md)
@@ -150,21 +153,21 @@
 ## Part IX - Recommendation
 
 * [Collaborative Filtering](recommend/cf.md)
-    * [Item-based collaborative filtering](recommend/item_based_cf.md)
+    * [Item-based Collaborative Filtering](recommend/item_based_cf.md)
 
 * [News20 Related Article Recommendation Tutorial](recommend/news20.md)
-    * [Data preparation](multiclass/news20_dataset.md)
-    * [LSH/MinHash and Jaccard similarity](recommend/news20_jaccard.md)
-    * [LSH/MinHash and brute-force search](recommend/news20_knn.md)
+    * [Data Preparation](multiclass/news20_dataset.md)
+    * [LSH/MinHash and Jaccard Similarity](recommend/news20_jaccard.md)
+    * [LSH/MinHash and Brute-force Search](recommend/news20_knn.md)
     * [kNN search using b-Bits MinHash](recommend/news20_bbit_minhash.md)
 
 * [MovieLens Movie Recommendation Tutorial](recommend/movielens.md)
-    * [Data preparation](recommend/movielens_dataset.md)
-    * [Item-based collaborative filtering](recommend/movielens_cf.md)
+    * [Data Preparation](recommend/movielens_dataset.md)
+    * [Item-based Collaborative Filtering](recommend/movielens_cf.md)
     * [Matrix Factorization](recommend/movielens_mf.md)
     * [Factorization Machine](recommend/movielens_fm.md)
-    * [SLIM for fast top-k recommendation](recommend/movielens_slim.md)
-    * [10-fold cross validation (Matrix 
Factorization)](recommend/movielens_cv.md)
+    * [SLIM for fast top-k Recommendation](recommend/movielens_slim.md)
+    * [10-fold Cross Validation (Matrix 
Factorization)](recommend/movielens_cv.md)
 
 ## Part X - Anomaly Detection
 
@@ -187,16 +190,16 @@
     * [Installation](spark/getting_started/installation.md)
 
 * [Binary Classification](spark/binaryclass/index.md)
-    * [a9a tutorial for DataFrame](spark/binaryclass/a9a_df.md)
-    * [a9a tutorial for SQL](spark/binaryclass/a9a_sql.md)
+    * [a9a Tutorial for DataFrame](spark/binaryclass/a9a_df.md)
+    * [a9a Tutorial for SQL](spark/binaryclass/a9a_sql.md)
 
 * [Regression](spark/binaryclass/index.md)
-    * [E2006-tfidf regression tutorial for 
DataFrame](spark/regression/e2006_df.md)
-    * [E2006-tfidf regression tutorial for SQL](spark/regression/e2006_sql.md)
+    * [E2006-tfidf Regression Tutorial for 
DataFrame](spark/regression/e2006_df.md)
+    * [E2006-tfidf Regression Tutorial for SQL](spark/regression/e2006_sql.md)
 
-* [Generic features](spark/misc/misc.md)
-    * [Top-k join processing](spark/misc/topk_join.md)
-    * [Other utility functions](spark/misc/functions.md)
+* [Generic Features](spark/misc/misc.md)
+    * [Top-k Join Processing](spark/misc/topk_join.md)
+    * [Other Utility Functions](spark/misc/functions.md)
 
 ## Part XIV - Hivemall on Docker
 

http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/a823e2b1/docs/gitbook/binaryclass/a9a_generic.md
----------------------------------------------------------------------
diff --git a/docs/gitbook/binaryclass/a9a_generic.md 
b/docs/gitbook/binaryclass/a9a_generic.md
new file mode 100644
index 0000000..8a482ca
--- /dev/null
+++ b/docs/gitbook/binaryclass/a9a_generic.md
@@ -0,0 +1,109 @@
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing,
+  software distributed under the License is distributed on an
+  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  KIND, either express or implied.  See the License for the
+  specific language governing permissions and limitations
+  under the License.
+-->
+
+This page shows the usage of General Binary Classifier using a9a dataset.
+
+<!-- toc -->
+
+# Training
+
+```sql
+create table model
+as
+select 
+ feature,
+ avg(weight) as weight
+from (
+  select 
+     train_classifier(
+       add_bias(features), label, 
+       "-loss logistic -iter 30"
+     ) as (feature,weight)
+  from 
+     a9a_train
+ ) t 
+group by feature;
+```
+
+# Prediction
+
+```sql
+create table predict 
+as
+WITH exploded as (
+select 
+  rowid,
+  label,
+  extract_feature(feature) as feature,
+  extract_weight(feature) as value
+from 
+  a9a_test LATERAL VIEW explode(add_bias(features)) t AS feature
+)
+select
+  t.rowid, 
+  sigmoid(sum(m.weight * t.value)) as prob,
+  (case when sigmoid(sum(m.weight * t.value)) >= 0.5 then 1.0 else 0.0 end) as 
label
+from 
+  exploded t LEFT OUTER JOIN
+  model m ON (t.feature = m.feature)
+group by
+  t.rowid;
+```
+
+# Evaluation
+
+```sql
+create or replace view submit as
+select 
+  t.label as actual, 
+  p.label as predicted, 
+  p.prob as probability
+from 
+  a9a_test t 
+  JOIN predict p on (t.rowid = p.rowid);
+
+select 
+  sum(if(actual == predicted, 1, 0)) / count(1) as accuracy
+from
+  submit;
+```
+
+> 0.8462625145875561
+
+The following table shows accuracy for changing optimizer by `-loss logistic 
-opt XXXXXX -reg l1 -iter 30` option:
+
+| Optimizer | Accuracy |
+|:--:|:--:|
+| Default (Adagrad+RDA) | 0.8462625145875561 |
+| SGD | 0.8462010932989374 |
+| Momentum | 0.8254406977458387 |
+| Nesterov | 0.8286346047540077 |
+| AdaGrad | 0.850991953811191 |
+| RMSprop | 0.8463239358761747 |
+| RMSpropGraves | 0.825563540323076 |
+| AdaDelta | 0.8492721577298692 |
+| Adam | 0.8341625207296849 |
+| Nadam | 0.8349609974817271 |
+| Eve | 0.8348381549044899 |
+| AdamHD | 0.8447269823720902 |
+
+> #### Note
+> Optimizers using momentum need to tune decay rate well.
+> Default (Adagrad+RDA), AdaDelta, Adam, and AdamHD is worth trying in my 
experience.
+

http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/a823e2b1/docs/gitbook/binaryclass/a9a_lr.md
----------------------------------------------------------------------
diff --git a/docs/gitbook/binaryclass/a9a_lr.md 
b/docs/gitbook/binaryclass/a9a_lr.md
index 247d5a2..5a73a2a 100644
--- a/docs/gitbook/binaryclass/a9a_lr.md
+++ b/docs/gitbook/binaryclass/a9a_lr.md
@@ -17,6 +17,12 @@
   under the License.
 -->
 
+This pages shows an example of applying logistic regression for a9a binary 
classification task.
+
+> #### Caution
+>
+> `logloss()` became deprecated since v0.5.0 release. Use smarter [general 
classifier](./a9a_generic.md) instead.
+
 <!-- toc -->
 
 # UDF preparation
@@ -31,6 +37,7 @@ set hivevar:num_test_instances=16281;
 ```
 
 # training
+
 ```sql
 create table a9a_model1 
 as
@@ -45,10 +52,13 @@ from
  ) t 
 group by feature;
 ```
-_"-total_steps" option is optional for logress() function._  
-_I recommend you NOT to use options (e.g., total_steps and eta0) if you are 
not familiar with those options. Hivemall then uses an autonomic ETA (learning 
rate) estimator._
+
+> #### Note
+>
+> `-total_steps` option is optional for logress() function. We recommend you 
NOT to use options (e.g., `total_steps` and `eta0`) if you are not familiar 
with those options. Hivemall then uses an autonomic ETA (learning rate) 
estimator.
 
 # prediction
+
 ```sql
 create or replace view a9a_predict1 
 as
@@ -73,6 +83,7 @@ group by
 ```
 
 # evaluation
+
 ```sql
 create or replace view a9a_submit1 as
 select 
@@ -88,4 +99,5 @@ from
 select count(1) / ${num_test_instances} from a9a_submit1 
 where actual == predicted;
 ```
+
 > 0.8430071862907684

http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/a823e2b1/docs/gitbook/binaryclass/a9a_minibatch.md
----------------------------------------------------------------------
diff --git a/docs/gitbook/binaryclass/a9a_minibatch.md 
b/docs/gitbook/binaryclass/a9a_minibatch.md
index 3fc5945..029b447 100644
--- a/docs/gitbook/binaryclass/a9a_minibatch.md
+++ b/docs/gitbook/binaryclass/a9a_minibatch.md
@@ -17,8 +17,13 @@
   under the License.
 -->
         
-This page explains how to apply [Mini-Batch Gradient 
Descent](https://class.coursera.org/ml-003/lecture/106) for the training of 
logistic regression explained in [this example](./a9a_lr.html). 
-So, refer [this page](./a9a_lr.html) first. This content depends on it.
+This page explains how to apply [Mini-Batch Gradient 
Descent](https://class.coursera.org/ml-003/lecture/106) for the training of 
logistic regression explained in [this example](./a9a_lr.html). So, refer [this 
page](./a9a_lr.html) first. This content depends on it.
+
+> #### Caution
+>
+> `logloss()` became deprecated since v0.5.0 release. Use smarter [general 
classifier](./a9a_generic.md) instead. You can use `-mini_batch` option in 
general classifier as well.
+
+<!-- toc -->
 
 # Training
 

http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/a823e2b1/docs/gitbook/binaryclass/general.md
----------------------------------------------------------------------
diff --git a/docs/gitbook/binaryclass/general.md 
b/docs/gitbook/binaryclass/general.md
index a14130c..a436cb4 100644
--- a/docs/gitbook/binaryclass/general.md
+++ b/docs/gitbook/binaryclass/general.md
@@ -17,7 +17,7 @@
   under the License.
 -->
 
-Hivemall has a generic function for classification: `train_classifier`. 
Compared to the other functions we will see in the later chapters, 
`train_classifier` provides simpler and configureable generic interface which 
can be utilized to build binary classification models in a variety of settings.
+Hivemall has a generic function for classification: `train_classifier`. 
Compared to the other functions we will see in the later chapters, 
`train_classifier` provides simpler and configurable generic interface which 
can be utilized to build binary classification models in a variety of settings.
 
 Here, we briefly introduce usage of the function. Before trying sample 
queries, you first need to prepare [a9a 
data](https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html#a9a). 
See [our a9a tutorial page](a9a_dataset.md) for further instructions.
 
@@ -26,19 +26,6 @@ Here, we briefly introduce usage of the function. Before 
trying sample queries,
 > #### Note
 > This feature is supported from Hivemall v0.5-rc.1 or later.
 
-# Preparation
-
-- Set `total_steps` ideally be `count(1) / {# of map tasks}`:
-       ```
-       hive> select count(1) from a9a_train; 
-       hive> set hivevar:total_steps=32561;
-       ```
-- Set `n_samples` to compute accuracy of prediction:
-       ```
-       hive> select count(1) from a9a_test;
-       hive> set hivevar:n_samples=16281;
-       ```
-
 # Training
 
 ```sql
@@ -49,17 +36,13 @@ select
 from
  (
   select
-    train_classifier(add_bias(features), label, '-loss logloss -opt SGD -reg 
no -eta simple -total_steps ${total_steps}') as (feature, weight)
+    train_classifier(add_bias(features), label, '-loss logloss -opt SGD -reg 
no') as (feature, weight)
   from
      a9a_train
  ) t
 group by feature;
 ```
 
-> #### Note
->
-> `-total_steps` option is an optional parameter and training works without it.
-
 # Prediction & evaluation
 
 ```sql
@@ -78,24 +61,32 @@ predict as (
     sigmoid(sum(m.weight * t.value)) as prob,
     (case when sigmoid(sum(m.weight * t.value)) >= 0.5 then 1.0 else 0.0 
end)as label
   from
-    test_exploded t LEFT OUTER JOIN
-    classification_model m ON (t.feature = m.feature)
+    test_exploded t
+    LEFT OUTER JOIN classification_model m 
+      ON (t.feature = m.feature)
   group by
     t.rowid
 ),
 submit as (
   select
     t.label as actual,
-    pd.label as predicted,
-    pd.prob as probability
+    p.label as predicted,
+    p.prob as probability
   from
-    a9a_test t JOIN predict pd
-      on (t.rowid = pd.rowid)
+    a9a_test t
+    JOIN predict p
+      on (t.rowid = p.rowid)
 )
-select count(1) / ${n_samples} from submit
-where actual = predicted;
+select 
+  sum(if(actual = predicted, 1, 0)) / count(1) as accuracy
+from
+  submit;
 ```
 
+|accuracy|
+|:-:|
+| 0.8461396720103188 |
+
 # Comparison with the other binary classifiers
 
 In the next part of this user guide, our binary classification tutorials 
introduce many different functions:
@@ -115,20 +106,20 @@ All of them actually have the same interface, but 
mathematical formulation and i
 
 In particular, the above sample queries are almost same as [a9a tutorial using 
Logistic Regression](a9a_lr.md). The difference is only in a choice of training 
function: `logress()` vs. `train_classifier()`.
 
-However, at the same time, the options `-loss logloss -opt SGD -reg no -eta 
simple -total_steps ${total_steps}` for `train_classifier` indicates that 
Hivemall uses the generic classifier as Logistic Regressor (`logress`). Hence, 
the accuracy of prediction based on either `logress` and `train_classifier` 
should be same under the configuration.
+However, at the same time, the options `-loss logloss -opt SGD -reg no` for 
`train_classifier` indicates that Hivemall uses the generic classifier as 
`logress`. Hence, the accuracy of prediction based on either `logress` and 
`train_classifier` would be (almost) same under the configuration.
 
 In addition, `train_classifier` supports the `-mini_batch` option in a similar 
manner to [what `logress` does](a9a_minibatch.md). Thus, following two training 
queries show the same results:
 
 ```sql
 select
-       logress(add_bias(features), label, '-total_steps ${total_steps} 
-mini_batch 10') as (feature, weight)
+       logress(add_bias(features), label, '-mini_batch 10') as (feature, 
weight)
 from
        a9a_train
 ```
 
 ```sql
 select
-       train_classifier(add_bias(features), label, '-loss logloss -opt SGD 
-reg no -eta simple -total_steps ${total_steps} -mini_batch 10') as (feature, 
weight)
+       train_classifier(add_bias(features), label, '-loss logloss -opt SGD 
-reg no -mini_batch 10') as (feature, weight)
 from
        a9a_train
 ```

http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/a823e2b1/docs/gitbook/binaryclass/news20_adagrad.md
----------------------------------------------------------------------
diff --git a/docs/gitbook/binaryclass/news20_adagrad.md 
b/docs/gitbook/binaryclass/news20_adagrad.md
index e3dfb47..063ecb0 100644
--- a/docs/gitbook/binaryclass/news20_adagrad.md
+++ b/docs/gitbook/binaryclass/news20_adagrad.md
@@ -19,25 +19,21 @@
 
 <!-- toc -->
 
-> #### Note
-> This feature is supported since Hivemall `v0.3-beta2` or later.
-
-## UDF preparation
-
-```
-add jar ./tmp/hivemall-with-dependencies.jar;
-source ./tmp/define-all.hive;
-
-use news20;
-```
+> #### Caution
+>
+> `train_adagrad()` became deprecated since v0.5.0 release. Use smarter 
[general classifier](./a9a_generic.md) instead.
 
-#[AdaGradRDA]
+# AdaGradRDA
 
 > #### Note
+>
 > The current AdaGradRDA implmenetation can only be applied to classification, 
 > not to regression, because it uses hinge loss for the loss function.
 
 ## model building
+
 ```sql
+use news20;
+
 drop table news20b_adagrad_rda_model1;
 create table news20b_adagrad_rda_model1 as
 select 
@@ -45,7 +41,7 @@ select
  voted_avg(weight) as weight
 from 
  (select 
-     train_adagrad_rda(addBias(features),label) as (feature,weight)
+     train_adagrad_rda(add_bias(features),label) as (feature,weight)
   from 
      news20b_train_x3
  ) t 
@@ -53,6 +49,7 @@ group by feature;
 ```
 
 ## prediction
+
 ```sql
 create or replace view news20b_adagrad_rda_predict1 
 as
@@ -68,6 +65,7 @@ group by
 ```
 
 ## evaluation
+
 ```sql
 create or replace view news20b_adagrad_rda_submit1 as
 select 
@@ -82,15 +80,19 @@ from
 select count(1)/4996 from news20b_adagrad_rda_submit1 
 where actual == predicted;
 ```
+
 > SCW1 0.9661729383506805 
 
 > ADAGRAD+RDA 0.9677742193755005
 
-#[AdaGrad]
+# AdaGrad
 
-_Note that AdaGrad is better suited for a regression problem because the 
current implementation only support logistic loss._
+> #### Note
+>
+> AdaGrad is better suited for a binary classification problem because the 
current implementation only support logistic loss.
 
 ## model building
+
 ```sql
 drop table news20b_adagrad_model1;
 create table news20b_adagrad_model1 as
@@ -99,7 +101,7 @@ select
  voted_avg(weight) as weight
 from 
  (select 
-     adagrad(addBias(features),convert_label(label)) as (feature,weight)
+     train_adagrad_regr(add_bias(features),convert_label(label)) as 
(feature,weight)
   from 
      news20b_train_x3
  ) t 
@@ -110,6 +112,7 @@ group by feature;
 > `adagrad` takes 0/1 for a label value and `convert_label(label)` converts a 
 > label value from -1/+1 to 0/1.
 
 ## prediction
+
 ```sql
 create or replace view news20b_adagrad_predict1 
 as
@@ -124,6 +127,7 @@ group by
 ```
 
 ## evaluation
+
 ```sql
 create or replace view news20b_adagrad_submit1 as
 select 
@@ -138,14 +142,16 @@ from
 select count(1)/4996 from news20b_adagrad_submit1 
 where actual == predicted;
 ```
+
 > 0.9549639711769415 (adagrad)
 
-#[AdaDelta]
+# AdaDelta
 
 > #### Caution
 > AdaDelta can only be applied for regression problem because the current 
 > implementation only support logistic loss.
 
 ## model building
+
 ```sql
 drop table news20b_adadelta_model1;
 create table news20b_adadelta_model1 as
@@ -154,7 +160,7 @@ select
  voted_avg(weight) as weight
 from 
  (select 
-     adadelta(addBias(features),convert_label(label)) as (feature,weight)
+     adadelta(add_bias(features),convert_label(label)) as (feature,weight)
   from 
      news20b_train_x3
  ) t 
@@ -162,6 +168,7 @@ group by feature;
 ```
 
 ## prediction
+
 ```sql
 create or replace view news20b_adadelta_predict1 
 as
@@ -176,6 +183,7 @@ group by
 ```
 
 ## evaluation
+
 ```sql
 create or replace view news20b_adadelta_submit1 as
 select 
@@ -187,7 +195,6 @@ from
 ```
 
 
-
 ```sql
 select count(1)/4996 from news20b_adadelta_submit1 
 where actual == predicted;

http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/a823e2b1/docs/gitbook/binaryclass/news20_dataset.md
----------------------------------------------------------------------
diff --git a/docs/gitbook/binaryclass/news20_dataset.md 
b/docs/gitbook/binaryclass/news20_dataset.md
index 2edd3f7..be48473 100644
--- a/docs/gitbook/binaryclass/news20_dataset.md
+++ b/docs/gitbook/binaryclass/news20_dataset.md
@@ -59,11 +59,6 @@ hadoop fs -copyFromLocal news20.test.t 
/dataset/news20-binary/test
 create database news20;
 use news20;
 
-delete jar /home/myui/tmp/hivemall.jar;
-add jar /home/myui/tmp/hivemall.jar;
-
-source /home/myui/tmp/define-all.hive;
-
 Create external table news20b_train (
   rowid int,
   label int,
@@ -82,10 +77,10 @@ as
 select 
   * 
 from (
-select
-   amplify(3, *) as (rowid, label, features)
-from  
-   news20b_train 
+  select
+    amplify(3, *) as (rowid, label, features)
+  from
+    news20b_train
 ) t
 CLUSTER BY rand(${seed});
 
@@ -93,11 +88,8 @@ create table news20b_test_exploded as
 select 
   rowid,
   label,
-  cast(split(feature,":")[0] as int) as feature,
-  cast(split(feature,":")[1] as float) as value
-  -- hivemall v0.3.1 or later
-  -- extract_feature(feature) as feature,
-  -- extract_weight(feature) as value
+  extract_feature(feature) as feature,
+  extract_weight(feature) as value
 from 
   news20b_test LATERAL VIEW explode(add_bias(features)) t AS feature;
 ```

http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/a823e2b1/docs/gitbook/binaryclass/news20_generic.md
----------------------------------------------------------------------
diff --git a/docs/gitbook/binaryclass/news20_generic.md 
b/docs/gitbook/binaryclass/news20_generic.md
new file mode 100644
index 0000000..23a0363
--- /dev/null
+++ b/docs/gitbook/binaryclass/news20_generic.md
@@ -0,0 +1,83 @@
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing,
+  software distributed under the License is distributed on an
+  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  KIND, either express or implied.  See the License for the
+  specific language governing permissions and limitations
+  under the License.
+-->
+
+In this tutorial, we build a binary classification model using general 
classifier.
+
+<!-- toc -->
+
+## Training
+
+```sql
+-- set mapred.reduce.tasks=3; -- explicitly use 3 reducers
+
+drop table news20b_generic_model;
+create table news20b_generic_model as
+select 
+ feature,
+ voted_avg(weight) as weight
+from 
+ (select 
+     train_classifier(
+       add_bias(features), label, 
+       '-loss logistic -opt AdamHD -reg l1 -iters 20'
+     ) as (feature,weight)
+  from
+     news20b_train_x3
+ ) t 
+group by feature;
+```
+
+> #### Note
+> Default (Adagrad+RDA), AdaDelta, Adam, and AdamHD is worth trying in my 
experience.
+
+## prediction
+
+```sql
+create or replace view news20b_generic_predict
+as
+select
+  t.rowid, 
+  sum(m.weight * t.value) as total_weight,
+  case when sum(m.weight * t.value) > 0.0 then 1 else -1 end as label
+from 
+  news20b_test_exploded t LEFT OUTER JOIN
+  news20b_generic_model m ON (t.feature = m.feature)
+group by
+  t.rowid;
+```
+
+## evaluation
+
+```sql
+WITH submit as (
+select 
+  t.label as actual, 
+  p.label as predicted
+from 
+  news20b_test t 
+  JOIN news20b_generic_predict p
+    on (t.rowid = p.rowid)
+)
+select 
+  sum(if(actual = predicted, 1, 0)) / count(1) as accuracy
+from
+  submit;
+```
+
+> 0.967173738991193 (`-opt AdamHD -reg l1 `)
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/a823e2b1/docs/gitbook/binaryclass/news20_pa.md
----------------------------------------------------------------------
diff --git a/docs/gitbook/binaryclass/news20_pa.md 
b/docs/gitbook/binaryclass/news20_pa.md
index d40b433..12e459a 100644
--- a/docs/gitbook/binaryclass/news20_pa.md
+++ b/docs/gitbook/binaryclass/news20_pa.md
@@ -17,15 +17,6 @@
   under the License.
 -->
         
-## UDF preparation
-```
-delete jar /home/myui/tmp/hivemall.jar;
-add jar /home/myui/tmp/hivemall.jar;
-
-source /home/myui/tmp/define-all.hive;
-```
-
----
 #[Perceptron]
 
 ## model building
@@ -37,7 +28,7 @@ select
  voted_avg(weight) as weight
 from 
  (select 
-     perceptron(add_bias(features),label) as (feature,weight)
+     train_perceptron(add_bias(features),label) as (feature,weight)
   from 
      news20b_train_x3
  ) t 
@@ -64,27 +55,20 @@ group by
 create or replace view news20b_perceptron_submit1 as
 select 
   t.label as actual, 
-  pd.label as predicted
+  p.label as predicted
 from 
-  news20b_test t JOIN news20b_perceptron_predict1 pd 
-    on (t.rowid = pd.rowid);
+  news20b_test t JOIN news20b_perceptron_predict1 p
+    on (t.rowid = p.rowid);
 ```
 
 ```sql
-select count(1)/4996 from news20b_perceptron_submit1 
-where actual == predicted;
+select 
+  sum(if(actual = predicted, 1, 0)) / count(1) as accuracy
+from
+  news20b_perceptron_submit1;
 ```
 > 0.9459567654123299
 
-## Cleaning
-
-```sql
-drop table news20b_perceptron_model1;
-drop view news20b_perceptron_predict1;
-drop view news20b_perceptron_submit1;
-```
-
----
 #[Passive Aggressive]
 
 ## model building
@@ -130,20 +114,13 @@ from
 ```
 
 ```sql
-select count(1)/4996 from news20b_pa_submit1 
-where actual == predicted;
+select 
+  sum(if(actual = predicted, 1, 0)) / count(1) as accuracy
+from
+  news20b_pa_submit1;
 ```
 > 0.9603682946357086
 
-## Cleaning
-
-```sql
-drop table news20b_pa_model1;
-drop view news20b_pa_predict1;
-drop view news20b_pa_submit1;
-```
-
----
 #[Passive Aggressive (PA1)]
 
 ## model building
@@ -171,8 +148,9 @@ select
   sum(m.weight * t.value) as total_weight,
   case when sum(m.weight * t.value) > 0.0 then 1 else -1 end as label
 from 
-  news20b_test_exploded t LEFT OUTER JOIN
-  news20b_pa1_model1 m ON (t.feature = m.feature)
+  news20b_test_exploded t 
+  LEFT OUTER JOIN news20b_pa1_model1 m 
+    ON (t.feature = m.feature)
 group by
   t.rowid;
 ```
@@ -182,27 +160,21 @@ group by
 create or replace view news20b_pa1_submit1 as
 select 
   t.label as actual, 
-  pd.label as predicted
+  p.label as predicted
 from 
-  news20b_test t JOIN news20b_pa1_predict1 pd 
-    on (t.rowid = pd.rowid);
+  news20b_test t 
+  JOIN news20b_pa1_predict1 p 
+    on (t.rowid = p.rowid);
 ```
 
 ```sql
-select count(1)/4996 from news20b_pa1_submit1 
-where actual == predicted;
+select 
+  sum(if(actual = predicted, 1, 0)) / count(1) as accuracy
+from 
+  news20b_pa1_submit1;
 ```
 > 0.9601681345076061
 
-## Cleaning
-
-```sql
-drop table news20b_pa1_model1;
-drop view news20b_pa1_predict1;
-drop view news20b_pa1_submit1;
-```
-
----
 #[Passive Aggressive (PA2)]
 
 ## model building
@@ -248,15 +220,10 @@ from
 ```
 
 ```sql
-select count(1)/4996 from news20b_pa2_submit1 
-where actual == predicted;
+select 
+  sum(if(actual = predicted, 1, 0)) / count(1) as accuracy
+from 
+  news20b_pa2_submit1;
 ```
 > 0.9597678142514011
 
-## Cleaning
-
-```sql
-drop table news20b_pa2_model1;
-drop view news20b_pa2_predict1;
-drop view news20b_pa2_submit1;
-```

http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/a823e2b1/docs/gitbook/binaryclass/news20_rf.md
----------------------------------------------------------------------
diff --git a/docs/gitbook/binaryclass/news20_rf.md 
b/docs/gitbook/binaryclass/news20_rf.md
index 659536a..9a0d1f8 100644
--- a/docs/gitbook/binaryclass/news20_rf.md
+++ b/docs/gitbook/binaryclass/news20_rf.md
@@ -17,11 +17,11 @@
   under the License.
 -->
 
-Hivemall Random Forest supports libsvm-like sparse inputs. 
+Hivemall Random Forest supports libsvm-like sparse inputs. This page shows a 
classification example on 20-newsgroup dataset.
 
 > #### Note
-> This feature, i.e., Sparse input support in Random Forest, is supported 
since Hivemall v0.5.0 or later._
-> 
[`feature_hashing`](https://hivemall.incubator.apache.org/userguide/ft_engineering/hashing.html#featurehashing-function)
 function is useful to prepare feature vectors for Random Forest.
+> This feature, i.e., Sparse input support in Random Forest, is supported 
since Hivemall v0.5.0 or later.
+> 
[`feature_hashing`](http://hivemall.incubator.apache.org/userguide/ft_engineering/hashing.html#featurehashing-function)
 function is useful to prepare feature vectors for Random Forest.
 
 <!-- toc -->
 

http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/a823e2b1/docs/gitbook/binaryclass/news20_scw.md
----------------------------------------------------------------------
diff --git a/docs/gitbook/binaryclass/news20_scw.md 
b/docs/gitbook/binaryclass/news20_scw.md
index f364c12..340ca8f 100644
--- a/docs/gitbook/binaryclass/news20_scw.md
+++ b/docs/gitbook/binaryclass/news20_scw.md
@@ -17,16 +17,6 @@
   under the License.
 -->
 
-## UDF preparation
-```
-use news20;
-
-delete jar /home/myui/tmp/hivemall.jar;
-add jar /home/myui/tmp/hivemall.jar;
-source /home/myui/tmp/define-all.hive;
-```
-
----
 # Confidece Weighted (CW)
 
 ## training
@@ -64,32 +54,21 @@ group by
 
 ## evaluation
 ```sql
-create or replace view news20b_cw_submit1 
-as
+WITH submit as (
 select 
   t.rowid,
   t.label as actual, 
-  pd.label as predicted
+  p.label as predicted
 from 
-  news20b_test t JOIN news20b_cw_predict1 pd 
-    on (t.rowid = pd.rowid);
-```
-
-```sql
-select count(1)/4996 from news20b_cw_submit1 
-where actual = predicted;
+  news20b_test t 
+  JOIN news20b_cw_predict1 p
+    on (t.rowid = p.rowid)
+)
+select sum(if(actual = predicted, 1, 0)) / count(1) as accuracy
+from submit;
 ```
 > 0.9655724579663731
 
-## Cleaning
-
-```sql
-drop table news20b_cw_model1;
-drop view news20b_cw_predict1;
-drop view news20b_cw_submit1;
-```
-
----
 # Adaptive Regularization of Weight Vectors (AROW)
 
 ## training
@@ -127,31 +106,21 @@ group by
 
 ## evaluation
 ```sql
-create or replace view news20b_arow_submit1 as
-select 
+WITH submit as (
+select
   t.rowid, 
   t.label as actual, 
-  pd.label as predicted
+  p.label as predicted
 from 
-  news20b_test t JOIN news20b_arow_predict1 pd 
-    on (t.rowid = pd.rowid);
-```
-
-```sql
-select count(1)/4996 from news20b_arow_submit1 
-where actual = predicted;
+  news20b_test t
+  JOIN news20b_arow_predict1 p
+    on (t.rowid = p.rowid)
+)
+select sum(if(actual = predicted, 1, 0)) / count(1) as accuracy
+from submit;
 ```
 > 0.9659727782225781
 
-## Cleaning
-
-```sql
-drop table news20b_arow_model1;
-drop view news20b_arow_predict1;
-drop view news20b_arow_submit1;
-```
-
----
 # Soft Confidence-Weighted (SCW1)
 
 ## training
@@ -189,31 +158,20 @@ group by
 
 ## evaluation
 ```sql
-create or replace view news20b_scw_submit1 as
-select 
-  t.rowid, 
-  t.label as actual, 
-  pd.label as predicted
-from 
-  news20b_test t JOIN news20b_scw_predict1 pd 
-    on (t.rowid = pd.rowid);
-```
-
-```sql
-select count(1)/4996 from news20b_scw_submit1 
-where actual = predicted;
+WITH submit as (
+  select 
+    t.rowid, 
+    t.label as actual, 
+    p.label as predicted
+  from 
+    news20b_test t JOIN news20b_scw_predict1 p
+      on (t.rowid = p.rowid)
+)
+select sum(if(actual = predicted, 1, 0)) / count(1) as accuracy
+from submit
 ```
 > 0.9661729383506805
 
-## Cleaning
-
-```sql
-drop table news20b_scw_model1;
-drop view news20b_scw_predict1;
-drop view news20b_scw_submit1;
-```
-
----
 # Soft Confidence-Weighted (SCW2)
 
 ## training
@@ -251,30 +209,21 @@ group by
 
 ## evaluation
 ```sql
-create or replace view news20b_scw2_submit1 as
+WITH submit as (
 select 
   t.rowid, 
   t.label as actual, 
   pd.label as predicted
 from 
-  news20b_test t JOIN news20b_scw2_predict1 pd 
-    on (t.rowid = pd.rowid);
-```
-
-```sql
-select count(1)/4996 from news20b_scw2_submit1 
-where actual = predicted;
+  news20b_test t
+  JOIN news20b_scw2_predict1 pd 
+    on (t.rowid = pd.rowid)
+)
+select sum(if(actual = predicted, 1, 0)) / count(1) as accuracy
+from submit;
 ```
 > 0.9579663730984788
 
-## Cleaning
-
-```sql
-drop table news20b_scw2_model1;
-drop view news20b_scw2_predict1;
-drop view news20b_scw2_submit1;
-```
-
 --
 
 | Algorithm | Accuracy |

http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/a823e2b1/docs/gitbook/regression/e2006_arow.md
----------------------------------------------------------------------
diff --git a/docs/gitbook/regression/e2006_arow.md 
b/docs/gitbook/regression/e2006_arow.md
index 169a7dc..1342f8f 100644
--- a/docs/gitbook/regression/e2006_arow.md
+++ b/docs/gitbook/regression/e2006_arow.md
@@ -16,15 +16,15 @@
   specific language governing permissions and limitations
   under the License.
 -->
-        
-https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/regression.html#E2006-tfidf
 
----
-#[PA1a]
+<!-- toc -->
 
-##Training
+# PA1a
+
+## Training
 ```sql
 set mapred.reduce.tasks=64;
+
 drop table e2006tfidf_pa1a_model ;
 create table e2006tfidf_pa1a_model as
 select 
@@ -37,9 +37,13 @@ from
      e2006tfidf_train_x3
  ) t 
 group by feature;
+
+-- reset to the default setting
 set mapred.reduce.tasks=-1;
 ```
-_Caution: Do not use voted_avg() for regression. voted_avg() is for 
classification._
+
+> #### Caution
+> Do not use `voted_avg()` for regression. `voted_avg()` is for classification.
 
 ## prediction
 ```sql
@@ -57,36 +61,31 @@ group by
 
 ## evaluation
 ```sql
-drop table e2006tfidf_pa1a_submit;
-create table e2006tfidf_pa1a_submit as
+WITH submit as (
+  select 
+    t.target as actual, 
+    p.predicted as predicted
+  from 
+    e2006tfidf_test t
+    JOIN e2006tfidf_pa1a_predict p 
+      on (t.rowid = p.rowid)
+)
 select 
-  t.target as actual, 
-  p.predicted as predicted
+   rmse(predicted, actual) as RMSE,
+   mse(predicted, actual) as MSE, 
+   mae(predicted, actual) as MAE,
+   r2(predicted, actual) as R2
 from 
-  e2006tfidf_test t JOIN e2006tfidf_pa1a_predict p 
-    on (t.rowid = p.rowid);
-
-select avg(actual), avg(predicted) from e2006tfidf_pa1a_submit;
+   submit;
 ```
-> -3.8200363760415414     -3.8869923258589476
-
-```sql
-set hivevar:mean_actual=-3.8200363760415414;
 
-select 
-   sqrt(sum(pow(predicted - actual,2.0))/count(1)) as RMSE, 
-   sum(pow(predicted - actual,2.0))/count(1) as MSE, 
-   sum(abs(predicted - actual))/count(1) as MAE,
-   1 - sum(pow(actual - predicted,2.0)) / sum(pow(actual - 
${mean_actual},2.0)) as R2
-from 
-   e2006tfidf_pa1a_submit;
-```
-> 0.3797959864675519      0.14424499133686086     0.23846059576113587     
0.5010367946980386
+| rmse | mse | mae | r2 |
+|:-:|:-:|:-:|:-:|
+| 0.3797959864675519 | 0.14424499133686086 | 0.23846059576113587 
|0.5010367946980386 |
 
----
-#[PA2a]
+# PA2a
 
-##Training
+## Training
 ```sql
 set mapred.reduce.tasks=64;
 drop table e2006tfidf_pa2a_model;
@@ -120,36 +119,31 @@ group by
 
 ## evaluation
 ```sql
-drop table e2006tfidf_pa2a_submit;
-create table e2006tfidf_pa2a_submit as
+WITH submit as (
+  select 
+    t.target as actual, 
+    p.predicted as predicted
+  from 
+    e2006tfidf_test t
+    JOIN e2006tfidf_pa2a_predict p 
+      on (t.rowid = p.rowid)
+)
 select 
-  t.target as actual, 
-  pd.predicted as predicted
+   rmse(predicted, actual) as RMSE,
+   mse(predicted, actual) as MSE, 
+   mae(predicted, actual) as MAE,
+   r2(predicted, actual) as R2
 from 
-  e2006tfidf_test t JOIN e2006tfidf_pa2a_predict pd 
-    on (t.rowid = pd.rowid);
-
-select avg(actual), avg(predicted) from e2006tfidf_pa2a_submit;
+   submit;
 ```
-> -3.8200363760415414     -3.9124877451612488
-
-```sql
-set hivevar:mean_actual=-3.8200363760415414;
 
-select 
-   sqrt(sum(pow(predicted - actual,2.0))/count(1)) as RMSE, 
-   sum(pow(predicted - actual,2.0))/count(1) as MSE, 
-   sum(abs(predicted - actual))/count(1) as MAE,
-   1 - sum(pow(actual - predicted,2.0)) / sum(pow(actual - 
${mean_actual},2.0)) as R2
-from 
-   e2006tfidf_pa2a_submit;
-```
-> 0.38538660838804495     0.14852283792484033     0.2466732002711477      
0.48623913673053565
+| rmse | mse | mae | r2 |
+|:-:|:-:|:-:|:-:|
+| 0.38538660838804495 | 0.14852283792484033 | 0.2466732002711477 
|0.48623913673053565 |
 
----
-#[AROW]
+# AROW
 
-##Training
+## Training
 ```sql
 set mapred.reduce.tasks=64;
 drop table e2006tfidf_arow_model ;
@@ -185,37 +179,32 @@ group by
 
 ## evaluation
 ```sql
-drop table e2006tfidf_arow_submit;
-create table e2006tfidf_arow_submit as
+WITH submit as (
+  select 
+    t.target as actual, 
+    p.predicted as predicted
+  from 
+    e2006tfidf_test t
+    JOIN e2006tfidf_arow_predict p 
+      on (t.rowid = p.rowid)
+)
 select 
-  t.target as actual, 
-  p.predicted as predicted
+   rmse(predicted, actual) as RMSE,
+   mse(predicted, actual) as MSE, 
+   mae(predicted, actual) as MAE,
+   r2(predicted, actual) as R2
 from 
-  e2006tfidf_test t JOIN e2006tfidf_arow_predict p
-    on (t.rowid = p.rowid);
-
-select avg(actual), avg(predicted) from e2006tfidf_arow_submit;
+   submit;
 ```
-> -3.8200363760415414     -3.8692518911517433
 
-```sql
-set hivevar:mean_actual=-3.8200363760415414;
-
-select 
-   sqrt(sum(pow(predicted - actual,2.0))/count(1)) as RMSE, 
-   sum(pow(predicted - actual,2.0))/count(1) as MSE, 
-   sum(abs(predicted - actual))/count(1) as MAE,
-   1 - sum(pow(actual - predicted,2.0)) / sum(pow(actual - 
${mean_actual},2.0)) as R2
-from 
-   e2006tfidf_arow_submit;
-```
-> 0.37862513029019407     0.14335698928726642     0.2368787001269389      
0.5041085155590119
+| rmse | mse | mae | r2 |
+|:-:|:-:|:-:|:-:|
+| 0.37862513029019407 | 0.14335698928726642 | 0.2368787001269389 | 
0.5041085155590119 |
 
---- 
-#[AROWe]
+# AROWe
 AROWe is a modified version of AROW that uses Hinge loss (epsilion = 0.1)
 
-##Training
+## Training
 ```sql
 set mapred.reduce.tasks=64;
 drop table e2006tfidf_arowe_model ;
@@ -251,28 +240,24 @@ group by
 
 ## evaluation
 ```sql
-drop table e2006tfidf_arowe_submit;
-create table e2006tfidf_arowe_submit as
+WITH submit as (
+  select 
+    t.target as actual, 
+    p.predicted as predicted
+  from 
+    e2006tfidf_test t
+    JOIN e2006tfidf_arowe_predict p 
+      on (t.rowid = p.rowid)
+)
 select 
-  t.target as actual, 
-  p.predicted as predicted
+   rmse(predicted, actual) as RMSE,
+   mse(predicted, actual) as MSE, 
+   mae(predicted, actual) as MAE,
+   r2(predicted, actual) as R2
 from 
-  e2006tfidf_test t JOIN e2006tfidf_arowe_predict p
-    on (t.rowid = p.rowid);
-
-select avg(actual), avg(predicted) from e2006tfidf_arowe_submit;
+   submit;
 ```
-> -3.8200363760415414     -3.86494905688414
-
-```sql
-set hivevar:mean_actual=-3.8200363760415414;
 
-select 
-   sqrt(sum(pow(predicted - actual,2.0))/count(1)) as RMSE, 
-   sum(pow(predicted - actual,2.0))/count(1) as MSE, 
-   sum(abs(predicted - actual))/count(1) as MAE,
-   1 - sum(pow(actual - predicted,2.0)) / sum(pow(actual - 
${mean_actual},2.0)) as R2
-from 
-   e2006tfidf_arowe_submit;
-```
-> 0.37789148212861856     0.14280197226536404     0.2357339155291536      
0.5060283955470721
+| rmse | mse | mae | r2 |
+|:-:|:-:|:-:|:-:|
+| 0.37789148212861856 | 0.14280197226536404 | 0.2357339155291536 
|0.5060283955470721 |

http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/a823e2b1/docs/gitbook/regression/e2006_generic.md
----------------------------------------------------------------------
diff --git a/docs/gitbook/regression/e2006_generic.md 
b/docs/gitbook/regression/e2006_generic.md
new file mode 100644
index 0000000..11e280c
--- /dev/null
+++ b/docs/gitbook/regression/e2006_generic.md
@@ -0,0 +1,90 @@
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing,
+  software distributed under the License is distributed on an
+  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  KIND, either express or implied.  See the License for the
+  specific language governing permissions and limitations
+  under the License.
+-->
+
+This tutorial shows how to apply General Regressor for a regression problem of 
e2006 dataset.
+
+<!-- toc -->
+
+## Training
+```sql
+set mapred.reduce.tasks=32;
+
+drop table e2006tfidf_generic_model;
+create table e2006tfidf_generic_model as
+select 
+ feature,
+ avg(weight) as weight
+from 
+ (select 
+     train_regressor(
+       add_bias(features), target,
+       '-loss squaredloss -opt AdamHD -reg No -iters 20'
+     ) as (feature, weight)
+  from 
+     e2006tfidf_train_x3
+ ) t 
+group by feature;
+
+-- reset to the default setting
+set mapred.reduce.tasks=-1;
+```
+
+> #### Caution
+> Regularization could not work well for regression problem. Then, try 
providing `-reg No` option as seen in the above query.
+> Also, do not use `voted_avg()` for regression. `voted_avg()` is for 
classification.
+
+## prediction
+```sql
+create or replace view e2006tfidf_generic_predict
+as
+select
+  t.rowid, 
+  sum(m.weight * t.value) as predicted
+from 
+  e2006tfidf_test_exploded t LEFT OUTER JOIN
+  e2006tfidf_generic_model m ON (t.feature = m.feature)
+group by
+  t.rowid;
+```
+
+## evaluation
+```sql
+WITH submit as (
+  select 
+    t.target as actual, 
+    p.predicted as predicted
+  from 
+    e2006tfidf_test t
+    JOIN e2006tfidf_generic_predict p 
+      on (t.rowid = p.rowid)
+)
+select 
+   rmse(predicted, actual) as RMSE,
+   mse(predicted, actual) as MSE, 
+   mae(predicted, actual) as MAE,
+   r2(predicted, actual) as R2
+from 
+   submit;
+```
+
+| rmse | mse | mae | r2 |
+|:-:|:-:|:-:|:-:|
+| 0.37125069279938866 | 0.13782707690402607 | 0.2270351090214029 | 
0.5232372408076887 |
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/a823e2b1/docs/gitbook/supervised_learning/prediction.md
----------------------------------------------------------------------
diff --git a/docs/gitbook/supervised_learning/prediction.md 
b/docs/gitbook/supervised_learning/prediction.md
index 65aad27..e3d26c8 100644
--- a/docs/gitbook/supervised_learning/prediction.md
+++ b/docs/gitbook/supervised_learning/prediction.md
@@ -121,7 +121,7 @@ Below we list possible options for `train_regressor` and 
`train_classifier`, and
                  - SquaredLoss (synonym: squared)
                  - QuantileLoss (synonym: quantile)
                  - EpsilonInsensitiveLoss (synonym: epsilon_insensitive)
-                 - SquaredEpsilonInsensitiveLoss (synonym: 
squared_epsilon_insensitive)
+                 - SquaredEpsilonInsensitiveLoss (synonym: 
squared\_epsilon_insensitive)
                  - HuberLoss (synonym: huber)
 
 - Regularization function: `-reg`, `-regularization`
@@ -134,9 +134,74 @@ Additionally, there are several variants of the SGD 
technique, and it is also co
 
 - Optimizer: `-opt`, `-optimizer`
        - SGD
-       - AdaGrad
+       - Momentum
+               - Hyperparameters
+                       - `-alpha 1.0` Learning rate.
+                       - `-momentum 0.9` Exponential decay rate of the first 
order moment.
+       - Nesterov
+               - See: 
[https://arxiv.org/abs/1212.0901](https://arxiv.org/abs/1212.0901)
+               - Hyperparameters
+                       - same as Momentum
+       - AdaGrad (default)
+               - See: 
[http://jmlr.org/papers/v12/duchi11a.html](http://jmlr.org/papers/v12/duchi11a.html)
+               - Hyperparameters
+                       - `-eps 1.0` Constant for the numerical stability.
+       - RMSprop
+               - Description: RMSprop optimizer introducing weight decay to 
AdaGrad.
+               - See: 
[http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf](http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf)
+               - Hyperparameters
+                       - `-decay 0.95` Weight decay rate
+                       - `-eps 1.0` Constant for numerical stability
+       - RMSpropGraves
+               - Description: Alex Graves's RMSprop introducing weight decay 
and momentum.
+               - See: 
[https://arxiv.org/abs/1308.0850](https://arxiv.org/abs/1308.0850)
+               - Hyperparameters
+                       - `-alpha 1.0` Learning rate.
+                       - `-decay 0.95` Weight decay rate
+                       - `-momentum 0.9` Exponential decay rate of the first 
order moment.
+                       - `-eps 1.0` Constant for numerical stability
        - AdaDelta
+               - See: 
[https://arxiv.org/abs/1212.5701](https://arxiv.org/abs/1212.5701)
+               - Hyperparameters
+                       - `-decay 0.95` Weight decay rate
+                       - `-eps 1e-6f` Constant for numerical stability
        - Adam
+               - See:
+                       - [Adam: A Method for Stochastic 
Optimization](https://arxiv.org/abs/1412.6980v8)
+                       - [Fixing Weight Decay Regularization in 
Adam](https://openreview.net/forum?id=rk6qdGgCZ)
+                       - [On the Convergence of Adam and 
Beyond](https://openreview.net/forum?id=ryQu7f-RZ)
+               - Hyperparameters
+                       - `-alpha 1.0` Learning rate.
+                       - `-beta1 0.9` Exponential decay rate of the first 
order moment.
+                       - `-beta2 0.999` Exponential decay rate of the second 
order moment.
+                       - `-eps 1e-8f` Constant for numerical stability
+                       - `-decay 0.0` Weight decay rate
+       - Nadam
+               - Description: Nadam is Adam optimizer with Nesterov momentum.
+               - See:
+                       - [Incorporating Nesterov Momentum into 
Adam](https://openreview.net/pdf?id=OM0jvwB8jIp57ZJjtNEZ)
+                       - [Adam 
report](http://cs229.stanford.edu/proj2015/054_report.pdf)
+                       - [On the importance of initialization and momentum in 
deep learning](http://www.cs.toronto.edu/~fritz/absps/momentum.pdf)
+               - Hyperparameters
+                       - same as Adam except ...
+                       - `-scheduleDecay 0.004` Scheduled decay rate (for each 
250 steps by the default; 1/250=0.004)
+       - Eve
+               - See: 
[https://openreview.net/forum?id=r1WUqIceg](https://openreview.net/forum?id=r1WUqIceg)
+               - Hyperparameters
+                       - same as Adam except ...
+                       - `-beta3 0.999` Decay rate for Eve coefficient.
+                       - `-c 10` Constant used for gradient clipping 
`clip(val, 1/c, c)`
+       - AdamHD
+               - Description: Adam optimizer with Hypergradient Descent. 
Learning rate `-alpha` is automatically tuned.
+               - See:
+                       - [Online Learning Rate Adaptation with Hypergradient 
Descent](https://openreview.net/forum?id=BkrsAzWAb)
+                       - [Convergence Analysis of an Adaptive Method of 
Gradient 
Descent](https://damaru2.github.io/convergence_analysis_hypergradient_descent/dissertation_hypergradients.pdf)
+               -  Hyperparameters
+                       - same as Adam except ...
+                       - `-alpha 0.02` Learning rate.
+                       - `-beta -1e-6` Constant used for tuning learning rate.
+
+Default (Adagrad+RDA), AdaDelta, Adam, and AdamHD is worth trying in my 
experience.
 
 > #### Note
 >
@@ -156,8 +221,13 @@ Furthermore, optimizer offers to set auxiliary options 
such as:
 For details of available options, following queries might be helpful to list 
all of them:
 
 ```sql
-select train_regressor(array(), 0, '-help');
-select train_classifier(array(), 0, '-help');
+select train_regressor('-help');
+-- v0.5.0 or before
+-- select train_regressor(array(), 0, '-help');
+
+select train_classifier('-help');
+-- v0.5.0 or before
+-- select train_classifier(array(), 0, '-help');
 ```
 
 In practice, you can try different combinations of the options in order to 
achieve higher prediction accuracy.

http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/a823e2b1/docs/gitbook/supervised_learning/tutorial.md
----------------------------------------------------------------------
diff --git a/docs/gitbook/supervised_learning/tutorial.md 
b/docs/gitbook/supervised_learning/tutorial.md
index 5f96a2a..11dc1c8 100644
--- a/docs/gitbook/supervised_learning/tutorial.md
+++ b/docs/gitbook/supervised_learning/tutorial.md
@@ -37,15 +37,6 @@ FROM
 ;
 ```
 
-
-Hivemall function [`hivemall_version()`](../misc/funcs.html#others) shows 
current Hivemall version, for example:
-
-```sql
-select hivemall_version();
-```
-
-> "0.5.1-incubating-SNAPSHOT"
-
 Below we list ML and relevant problems that Hivemall can solve:
 
 - [Binary and multi-class classification](../binaryclass/general.html)
@@ -199,7 +190,9 @@ Notice that weight is learned for each possible value in a 
categorical feature,
 Of course, you can optimize hyper-parameters to build more accurate prediction 
model. Check the output of the following query to see all available options, 
including learning rate, number of iterations and regularization parameters, 
and their default values:
 
 ```sql
-select train_classifier(array(), 0, '-help');
+select train_classifier('-help');
+-- Hivemall 0.5.2 and before
+-- select train_classifier(array(), 0, '-help');
 ```
 
 ### Step 3. Prediction
@@ -230,14 +223,17 @@ with features_exploded as (
     -- to join with a model table
     extract_feature(fv) as feature,
     extract_weight(fv) as value
-  from unforeseen_samples t1 LATERAL VIEW explode(features) t2 as fv
+  from
+    unforeseen_samples t1
+    LATERAL VIEW explode(features) t2 as fv
 )
 select
   t1.id,
   sigmoid( sum(p1.weight * t1.value) ) as probability
 from
   features_exploded t1
-  LEFT OUTER JOIN classifier p1 ON (t1.feature = p1.feature)
+  LEFT OUTER JOIN classifier p1 
+    ON (t1.feature = p1.feature)
 group by
   t1.id
 ;
@@ -265,7 +261,9 @@ with features_exploded as (
     id,
     extract_feature(fv) as feature,
     extract_weight(fv) as value
-  from training t1 LATERAL VIEW explode(features) t2 as fv
+  from
+    training t1 
+    LATERAL VIEW explode(features) t2 as fv
 ),
 predictions as (
   select
@@ -273,7 +271,8 @@ predictions as (
     sigmoid( sum(p1.weight * t1.value) ) as probability
   from
     features_exploded t1
-    LEFT OUTER JOIN classifier p1 ON (t1.feature = p1.feature)
+    LEFT OUTER JOIN classifier p1 
+      ON (t1.feature = p1.feature)
   group by
     t1.id
 )
@@ -281,10 +280,13 @@ select
   auc(probability, label) as auc,
   logloss(probability, label) as logloss
 from (
-  select t1.probability, t2.label
-  from predictions t1
-  join training t2 on (t1.id = t2.id)
-  ORDER BY probability DESC
+  select 
+    t1.probability, t2.label
+  from 
+    predictions t1
+    join training t2 on (t1.id = t2.id)
+  ORDER BY 
+    probability DESC
 ) t
 ;
 ```
@@ -371,7 +373,9 @@ from
 Run the function with `-help` option to list available options:
 
 ```sql
-select train_regressor(array(), 0, '-help');
+select train_regressor('-help');
+-- Hivemall 0.5.2 and before
+-- select train_regressor(array(), 0, '-help');
 ```
 
 ### Step 3. Prediction
@@ -411,7 +415,7 @@ group by
 
 Output is like:
 
-|id| predicted_num_purchases|
+|id| predicted\_num_purchases|
 |---:|---:|
 | 1| 3.645142912864685|
 
@@ -425,7 +429,9 @@ with features_exploded as (
     id,
     extract_feature(fv) as feature,
     extract_weight(fv) as value
-  from training t1 LATERAL VIEW explode(features) t2 as fv
+  from
+    training t1 
+    LATERAL VIEW explode(features) t2 as fv
 ),
 predictions as (
   select

Reply via email to