[1/2] incubator-hivemall git commit: Close #104: [HIVEMALL-101-2] Renamed train_regression to train_regressor

myui Thu, 20 Jul 2017 04:26:11 -0700

Repository: incubator-hivemall
Updated Branches:
  refs/heads/master 0737e23eb -> 7205de1e9



http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/7205de1e/docs/gitbook/misc/prediction.md
----------------------------------------------------------------------
diff --git a/docs/gitbook/misc/prediction.md b/docs/gitbook/misc/prediction.md
index ee85e40..53d0cea 100644
--- a/docs/gitbook/misc/prediction.md
+++ b/docs/gitbook/misc/prediction.md
@@ -56,7 +56,7 @@ The goal of regression is to predict **real values** as shown 
below:
 
 In practice, target values could be any of small/large float/int 
negative/positive values. [Our CTR prediction 
tutorial](../regression/kddcup12tr2.md) solves regression problem with small 
floating point target values in a 0-1 range, for example.
 
-While there are several ways to realize regression by using Hivemall, 
`train_regression()` is one of the most flexible functions. This feature is 
explained in: [Regression](../regression/general.md).
+While there are several ways to realize regression by using Hivemall, 
`train_regressor()` is one of the most flexible functions. This feature is 
explained in [this page](../regression/general.md).
 
 # Classification
 
@@ -103,10 +103,10 @@ Eventually, minimizing the function $$E(\mathbf{w})$$ can 
be implemented by the
 
 Interestingly, depending on a choice of loss and regularization function, 
prediction model you obtained will behave differently; even if one combination 
could work as a classifier, another choice might be appropriate for regression.
 
-Below we list possible options for `train_regression` and `train_classifier`, 
and this is the reason why these two functions are the most flexible in 
Hivemall:
+Below we list possible options for `train_regressor` and `train_classifier`, 
and this is the reason why these two functions are the most flexible in 
Hivemall:
 
 - Loss function: `-loss`, `-loss_function`
-       - For `train_regression`
+       - For `train_regressor`
                - SquaredLoss (synonym: squared)
                - QuantileLoss (synonym: quantile)
                - EpsilonInsensitiveLoss (synonym: epsilon_insensitive)
@@ -156,8 +156,8 @@ Furthermore, optimizer offers to set auxiliary options such 
as:
 For details of available options, following queries might be helpful to list 
all of them:
 
 ```sql
-select train_regression(array(), 0, '-help');
+select train_regressor(array(), 0, '-help');
 select train_classifier(array(), 0, '-help');
 ```
 
-In practice, you can try different combinations of the options in order to 
achieve higher prediction accuracy.
\ No newline at end of file
+In practice, you can try different combinations of the options in order to 
achieve higher prediction accuracy.

http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/7205de1e/docs/gitbook/misc/tokenizer.md
----------------------------------------------------------------------
diff --git a/docs/gitbook/misc/tokenizer.md b/docs/gitbook/misc/tokenizer.md
index 07c8cd1..b056874 100644
--- a/docs/gitbook/misc/tokenizer.md
+++ b/docs/gitbook/misc/tokenizer.md
@@ -101,4 +101,4 @@ select 
tokenize_cn("Smartcnä¸ºApache2.0åè®®çå¼æºä¸æåè¯ç³»ç»ï¼Java
 ```
 > [smartcn, ä¸º, apach, 2, 0, åè®®, ç, å¼æº, ä¸æ, åè¯, ç³»ç», 
 > java, è¯è¨, ç¼å, ä¿®æ¹, ç, ä¸ç§é¢, è®¡ç®, æ, ictcla, åè¯, 
 > ç³»ç»]
 
-For detailed APIs, please refer Javadoc of 
[SmartChineseAnalyzer](http://lucene.apache.org/core/5_3_1/analyzers-smartcn/org/apache/lucene/analysis/cn/smart/SmartChineseAnalyzer.html)
 as well.
\ No newline at end of file
+For detailed APIs, please refer Javadoc of 
[SmartChineseAnalyzer](http://lucene.apache.org/core/5_3_1/analyzers-smartcn/org/apache/lucene/analysis/cn/smart/SmartChineseAnalyzer.html)
 as well.

http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/7205de1e/docs/gitbook/multiclass/iris_randomforest.md
----------------------------------------------------------------------
diff --git a/docs/gitbook/multiclass/iris_randomforest.md 
b/docs/gitbook/multiclass/iris_randomforest.md
index 771c733..b421297 100644
--- a/docs/gitbook/multiclass/iris_randomforest.md
+++ b/docs/gitbook/multiclass/iris_randomforest.md
@@ -381,4 +381,4 @@ digraph Tree {
 
 <img src="../resources/images/iris.png" alt="Iris Graphvis output"/>
 
-You can draw a graph by `dot -Tpng iris.dot -o iris.png` or using 
[Viz.js](http://viz-js.com/).
\ No newline at end of file
+You can draw a graph by `dot -Tpng iris.dot -o iris.png` or using 
[Viz.js](http://viz-js.com/).

http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/7205de1e/docs/gitbook/multiclass/news20_dataset.md
----------------------------------------------------------------------
diff --git a/docs/gitbook/multiclass/news20_dataset.md 
b/docs/gitbook/multiclass/news20_dataset.md
index 96decec..4cc9b83 100644
--- a/docs/gitbook/multiclass/news20_dataset.md
+++ b/docs/gitbook/multiclass/news20_dataset.md
@@ -92,5 +92,5 @@ select
   -- cast(extract_feature(feature) as int) as feature,
   -- extract_weight(feature) as value
 from 
-  news20mc_test LATERAL VIEW explode(addBias(features)) t AS feature;
-```
\ No newline at end of file
+  news20mc_test LATERAL VIEW explode(add_bias(features)) t AS feature;
+```

http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/7205de1e/docs/gitbook/multiclass/news20_ensemble.md
----------------------------------------------------------------------
diff --git a/docs/gitbook/multiclass/news20_ensemble.md 
b/docs/gitbook/multiclass/news20_ensemble.md
index 6bf1c93..7389a47 100644
--- a/docs/gitbook/multiclass/news20_ensemble.md
+++ b/docs/gitbook/multiclass/news20_ensemble.md
@@ -48,20 +48,20 @@ select
  voted_avg(weight) as weight
 from 
  (select 
-     -- train_multiclass_cw(addBias(features),label) as (label,feature,weight) 
     -- hivemall v0.1
-     train_multiclass_cw(addBias(features),label) as 
(label,feature,weight,covar)   -- hivemall v0.2 or later
+     -- train_multiclass_cw(add_bias(features),label) as 
(label,feature,weight)      -- hivemall v0.1
+     train_multiclass_cw(add_bias(features),label) as 
(label,feature,weight,covar)   -- hivemall v0.2 or later
   from 
      news20mc_train_x3
   union all
   select 
-     -- train_multiclass_arow(addBias(features),label) as 
(label,feature,weight)    -- hivemall v0.1
-     train_multiclass_arow(addBias(features),label) as 
(label,feature,weight,covar) -- hivemall v0.2 or later
+     -- train_multiclass_arow(add_bias(features),label) as 
(label,feature,weight)    -- hivemall v0.1
+     train_multiclass_arow(add_bias(features),label) as 
(label,feature,weight,covar) -- hivemall v0.2 or later
   from 
      news20mc_train_x3
   union all
   select 
-     -- train_multiclass_scw(addBias(features),label) as 
(label,feature,weight)     -- hivemall v0.1
-     train_multiclass_scw(addBias(features),label) as 
(label,feature,weight,covar)  -- hivemall v0.2 or later
+     -- train_multiclass_scw(add_bias(features),label) as 
(label,feature,weight)     -- hivemall v0.1
+     train_multiclass_scw(add_bias(features),label) as 
(label,feature,weight,covar)  -- hivemall v0.2 or later
   from 
      news20mc_train_x3
  ) t 
@@ -196,4 +196,4 @@ Unfortunately, too many cooks spoil the broth in this case 
too :-(
 | SCW2 |  0.8482344102178813 |
 | Ensemble(model) | 0.8494866015527173 |
 | Ensemble(prediction) | 0.8499874780866516 |
-| CW |  0.850488354620586 |
\ No newline at end of file
+| CW |  0.850488354620586 |

http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/7205de1e/docs/gitbook/multiclass/news20_one-vs-the-rest_dataset.md
----------------------------------------------------------------------
diff --git a/docs/gitbook/multiclass/news20_one-vs-the-rest_dataset.md 
b/docs/gitbook/multiclass/news20_one-vs-the-rest_dataset.md
index f437399..6f76d28 100644
--- a/docs/gitbook/multiclass/news20_one-vs-the-rest_dataset.md
+++ b/docs/gitbook/multiclass/news20_one-vs-the-rest_dataset.md
@@ -44,7 +44,7 @@ SET 
hivevar:possible_labels="1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,17,16,19,18,20"
 ```
 create or replace view news20_onevsrest_train
 as
-select transform(${possible_labels}, rowid, label, addBias(features))
+select transform(${possible_labels}, rowid, label, add_bias(features))
   ROW FORMAT DELIMITED
     FIELDS TERMINATED BY "\t"
     COLLECTION ITEMS TERMINATED BY ","

http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/7205de1e/docs/gitbook/multiclass/news20_pa.md
----------------------------------------------------------------------
diff --git a/docs/gitbook/multiclass/news20_pa.md 
b/docs/gitbook/multiclass/news20_pa.md
index 26083f9..c57d08d 100644
--- a/docs/gitbook/multiclass/news20_pa.md
+++ b/docs/gitbook/multiclass/news20_pa.md
@@ -44,7 +44,7 @@ select
  voted_avg(weight) as weight
 from 
  (select 
-     train_multiclass_pa2(addBias(features),label) as (label,feature,weight)
+     train_multiclass_pa2(add_bias(features),label) as (label,feature,weight)
   from 
      news20mc_train_x3
  ) t 
@@ -106,4 +106,4 @@ where actual == predicted;
 drop table news20mc_pa2_model1;
 drop table news20mc_pa2_predict1;
 drop view news20mc_pa2_submit1;
-```
\ No newline at end of file
+```

http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/7205de1e/docs/gitbook/multiclass/news20_scw.md
----------------------------------------------------------------------
diff --git a/docs/gitbook/multiclass/news20_scw.md 
b/docs/gitbook/multiclass/news20_scw.md
index 24e0fad..fbe5153 100644
--- a/docs/gitbook/multiclass/news20_scw.md
+++ b/docs/gitbook/multiclass/news20_scw.md
@@ -51,8 +51,8 @@ select
  argmin_kld(weight, covar) as weight -- [hivemall v0.2 or later]
 from 
  (select 
-     -- train_multiclass_cw(addBias(features),label) as (label,feature,weight) 
-- [hivemall v0.1]
-     train_multiclass_cw(addBias(features),label) as 
(label,feature,weight,covar)    -- [hivemall v0.2 or later]
+     -- train_multiclass_cw(add_bias(features),label) as 
(label,feature,weight) -- [hivemall v0.1]
+     train_multiclass_cw(add_bias(features),label) as 
(label,feature,weight,covar)    -- [hivemall v0.2 or later]
   from 
      news20mc_train_x3
  ) t 
@@ -126,8 +126,8 @@ select
  argmin_kld(weight, covar) as weight -- [hivemall v0.2 or later]
 from 
  (select 
-     -- train_multiclass_arow(addBias(features),label) as 
(label,feature,weight) -- [hivemall v0.1]
-     train_multiclass_arow(addBias(features),label) as 
(label,feature,weight,covar) -- [hivemall v0.2 or later]
+     -- train_multiclass_arow(add_bias(features),label) as 
(label,feature,weight) -- [hivemall v0.1]
+     train_multiclass_arow(add_bias(features),label) as 
(label,feature,weight,covar) -- [hivemall v0.2 or later]
   from 
      news20mc_train_x3
  ) t 
@@ -201,8 +201,8 @@ select
  argmin_kld(weight, covar) as weight -- [hivemall v0.2 or later]
 from 
  (select 
-     -- train_multiclass_scw(addBias(features),label) as 
(label,feature,weight) -- [hivemall v0.1]
-     train_multiclass_scw(addBias(features),label) as 
(label,feature,weight,covar) -- [hivemall v0.2 or later]
+     -- train_multiclass_scw(add_bias(features),label) as 
(label,feature,weight) -- [hivemall v0.1]
+     train_multiclass_scw(add_bias(features),label) as 
(label,feature,weight,covar) -- [hivemall v0.2 or later]
   from 
      news20mc_train_x3
  ) t 
@@ -276,8 +276,8 @@ select
  argmin_kld(weight, covar) as weight -- [hivemall v0.2 or later]
 from 
  (select 
-     -- train_multiclass_scw2(addBias(features),label) as 
(label,feature,weight) -- [hivemall v0.1]
-     train_multiclass_scw2(addBias(features),label) as 
(label,feature,weight,covar) -- [hivemall v0.2 or later]
+     -- train_multiclass_scw2(add_bias(features),label) as 
(label,feature,weight) -- [hivemall v0.1]
+     train_multiclass_scw2(add_bias(features),label) as 
(label,feature,weight,covar) -- [hivemall v0.2 or later]
   from 
      news20mc_train_x3
  ) t 

http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/7205de1e/docs/gitbook/recommend/item_based_cf.md
----------------------------------------------------------------------
diff --git a/docs/gitbook/recommend/item_based_cf.md 
b/docs/gitbook/recommend/item_based_cf.md
index 9515184..9e4f7e4 100644
--- a/docs/gitbook/recommend/item_based_cf.md
+++ b/docs/gitbook/recommend/item_based_cf.md
@@ -714,4 +714,4 @@ similarity as ( -- copy (i1, i2)'s similarity as (i2, i1)'s 
one
 ),
 topk as (
   ...
-```
\ No newline at end of file
+```

http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/7205de1e/docs/gitbook/recommend/movielens_cf.md
----------------------------------------------------------------------
diff --git a/docs/gitbook/recommend/movielens_cf.md 
b/docs/gitbook/recommend/movielens_cf.md
index e0ed545..faa555c 100644
--- a/docs/gitbook/recommend/movielens_cf.md
+++ b/docs/gitbook/recommend/movielens_cf.md
@@ -253,4 +253,4 @@ where -- at least 10 recommended items are necessary to 
compute recall@10 and pr
 |**MRR**| 0.03507380742291146   | 
 |**NDCG**| 0.15787655209987522 |
 
-If you set larger value to the DIMSUM's `-threshold` option, similarity will 
be more aggressively approximated. Consequently, while efficiency is improved, 
the accuracy is likely to be decreased.
\ No newline at end of file
+If you set larger value to the DIMSUM's `-threshold` option, similarity will 
be more aggressively approximated. Consequently, while efficiency is improved, 
the accuracy is likely to be decreased.

http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/7205de1e/docs/gitbook/recommend/movielens_cv.md
----------------------------------------------------------------------
diff --git a/docs/gitbook/recommend/movielens_cv.md 
b/docs/gitbook/recommend/movielens_cv.md
index a1f7b2f..6ac54c7 100644
--- a/docs/gitbook/recommend/movielens_cv.md
+++ b/docs/gitbook/recommend/movielens_cv.md
@@ -79,4 +79,4 @@ Then, issue SQL queies in 
[generate_cv.sql](https://gist.github.com/myui/2e20182
 
 > 0.8502739040257945 (RMSE)
 
-_We recommend to use [Tez](http://tez.apache.org/) for running queries having 
many stages._
\ No newline at end of file
+_We recommend to use [Tez](http://tez.apache.org/) for running queries having 
many stages._

http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/7205de1e/docs/gitbook/recommend/movielens_fm.md
----------------------------------------------------------------------
diff --git a/docs/gitbook/recommend/movielens_fm.md 
b/docs/gitbook/recommend/movielens_fm.md
index ad59324..64039fe 100644
--- a/docs/gitbook/recommend/movielens_fm.md
+++ b/docs/gitbook/recommend/movielens_fm.md
@@ -264,4 +264,4 @@ select
 from
   testing_fm as t
   JOIN predicted as p on (t.rowid = p.rowid);
-```
\ No newline at end of file
+```

http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/7205de1e/docs/gitbook/recommend/movielens_mf.md
----------------------------------------------------------------------
diff --git a/docs/gitbook/recommend/movielens_mf.md 
b/docs/gitbook/recommend/movielens_mf.md
index ca38fec..003082a 100644
--- a/docs/gitbook/recommend/movielens_mf.md
+++ b/docs/gitbook/recommend/movielens_mf.md
@@ -157,4 +157,4 @@ limit ${topk};
 | 2503    | 4.788541  |
 | 53      | 4.7518783 |
 | 904     | 4.7463417 |
-| 953     | 4.732769  |
\ No newline at end of file
+| 953     | 4.732769  |

http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/7205de1e/docs/gitbook/recommend/news20_bbit_minhash.md
----------------------------------------------------------------------
diff --git a/docs/gitbook/recommend/news20_bbit_minhash.md 
b/docs/gitbook/recommend/news20_bbit_minhash.md
index 474a40d..93cb47b 100644
--- a/docs/gitbook/recommend/news20_bbit_minhash.md
+++ b/docs/gitbook/recommend/news20_bbit_minhash.md
@@ -66,4 +66,4 @@ limit ${topn};
 | 3839  | 0.328125   | 41 |
 | 12669 | 0.328125   | 37 |
 | 13604 | 0.3125     | 41 |
-| 6333  | 0.3125     | 39 |
\ No newline at end of file
+| 6333  | 0.3125     | 39 |

http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/7205de1e/docs/gitbook/recommend/news20_jaccard.md
----------------------------------------------------------------------
diff --git a/docs/gitbook/recommend/news20_jaccard.md 
b/docs/gitbook/recommend/news20_jaccard.md
index 6a30fb8..0166ed5 100644
--- a/docs/gitbook/recommend/news20_jaccard.md
+++ b/docs/gitbook/recommend/news20_jaccard.md
@@ -139,4 +139,4 @@ from
 where
   similarity >= 0.1
 ;
-```
\ No newline at end of file
+```

http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/7205de1e/docs/gitbook/regression/e2006_arow.md
----------------------------------------------------------------------
diff --git a/docs/gitbook/regression/e2006_arow.md 
b/docs/gitbook/regression/e2006_arow.md
index abdb725..ddf6398 100644
--- a/docs/gitbook/regression/e2006_arow.md
+++ b/docs/gitbook/regression/e2006_arow.md
@@ -32,7 +32,7 @@ select
  avg(weight) as weight
 from 
  (select 
-     train_pa1a_regr(addBias(features),target) as (feature,weight)
+     train_pa1a_regr(add_bias(features),target) as (feature,weight)
   from 
      e2006tfidf_train_x3
  ) t 
@@ -96,7 +96,7 @@ select
  avg(weight) as weight
 from 
  (select 
-     train_pa2a_regr(addBias(features),target) as (feature,weight)
+     train_pa2a_regr(add_bias(features),target) as (feature,weight)
   from 
      e2006tfidf_train_x3
  ) t 
@@ -160,8 +160,8 @@ select
  argmin_kld(weight, covar) as weight -- [hivemall v0.2 or later]
 from 
  (select 
-     -- train_arow_regr(addBias(features),target) as (feature,weight)    -- 
[hivemall v0.1]
-     train_arow_regr(addBias(features),target) as (feature,weight,covar) -- 
[hivemall v0.2 or later]
+     -- train_arow_regr(add_bias(features),target) as (feature,weight)    -- 
[hivemall v0.1]
+     train_arow_regr(add_bias(features),target) as (feature,weight,covar) -- 
[hivemall v0.2 or later]
   from 
      e2006tfidf_train_x3
  ) t 
@@ -226,8 +226,8 @@ select
  argmin_kld(weight, covar) as weight -- [hivemall v0.2 or later]
 from 
  (select 
-     -- train_arowe_regr(addBias(features),target) as (feature,weight)    -- 
[hivemall v0.1]
-     train_arowe_regr(addBias(features),target) as (feature,weight,covar) -- 
[hivemall v0.2 or later]
+     -- train_arowe_regr(add_bias(features),target) as (feature,weight)    -- 
[hivemall v0.1]
+     train_arowe_regr(add_bias(features),target) as (feature,weight,covar) -- 
[hivemall v0.2 or later]
   from 
      e2006tfidf_train_x3
  ) t 

http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/7205de1e/docs/gitbook/regression/e2006_dataset.md
----------------------------------------------------------------------
diff --git a/docs/gitbook/regression/e2006_dataset.md 
b/docs/gitbook/regression/e2006_dataset.md
index 001eda2..804fa40 100644
--- a/docs/gitbook/regression/e2006_dataset.md
+++ b/docs/gitbook/regression/e2006_dataset.md
@@ -17,13 +17,11 @@
   under the License.
 -->
         
-http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/regression.html#E2006-tfidf
-
 Prerequisite
 ============
-* 
[hivemall.jar](https://github.com/myui/hivemall/tree/master/target/hivemall.jar)
-* 
[conv.awk](https://github.com/myui/hivemall/tree/master/scripts/misc/conv.awk)
-* 
[define-all.hive](https://github.com/myui/hivemall/tree/master/scripts/ddl/define-all.hive)
+
+* [E2006-tfidf 
Dataset](http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/regression.html#E2006-tfidf)
+* 
[conv.awk](https://github.com/apache/incubator-hivemall/blob/master/resources/misc/conv.awk)
 
 Data preparation
 ================
@@ -43,12 +41,7 @@ hadoop fs -put E2006.test.tsv /dataset/E2006-tfidf/test
 create database E2006;
 use E2006;
 
-delete jar /home/myui/tmp/hivemall.jar;
-add jar /home/myui/tmp/hivemall.jar;
-
-source /home/myui/tmp/define-all.hive;
-
-Create external table e2006tfidf_train (
+create external table e2006tfidf_train (
   rowid int,
   target float,
   features ARRAY<STRING>
@@ -56,7 +49,7 @@ Create external table e2006tfidf_train (
 ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' COLLECTION ITEMS TERMINATED BY 
"," 
 STORED AS TEXTFILE LOCATION '/dataset/E2006-tfidf/train';
 
-Create external table e2006tfidf_test (
+create external table e2006tfidf_test (
   rowid int, 
   target float,
   features ARRAY<STRING>
@@ -68,24 +61,28 @@ create table e2006tfidf_test_exploded as
 select 
   rowid,
   target,
-  split(feature,":")[0] as feature,
-  cast(split(feature,":")[1] as float) as value
+  -- split(feature,":")[0] as feature,
+  -- cast(split(feature,":")[1] as float) as value
   -- hivemall v0.3.1 or later
-  -- extract_feature(feature) as feature,
-  -- extract_weight(feature) as value
+  extract_feature(feature) as feature,
+  extract_weight(feature) as value
 from 
-  e2006tfidf_test LATERAL VIEW explode(addBias(features)) t AS feature;
+  e2006tfidf_test LATERAL VIEW explode(add_bias(features)) t AS feature;
 ```
 
 ## Amplify training examples (global shuffle)
+
 ```sql
 -- set mapred.reduce.tasks=32;
 set hivevar:seed=31;
 set hivevar:xtimes=3;
+
 create or replace view e2006tfidf_train_x3 as 
 select * from (
-select amplify(${xtimes}, *) as (rowid, target, features) from e2006tfidf_train
+  select amplify(${xtimes}, *) as (rowid, target, features)
+  from e2006tfidf_train
 ) t
 CLUSTER BY rand(${seed});
+
 -- set mapred.reduce.tasks=-1;
-```
\ No newline at end of file
+```

http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/7205de1e/docs/gitbook/regression/general.md
----------------------------------------------------------------------
diff --git a/docs/gitbook/regression/general.md 
b/docs/gitbook/regression/general.md
index dee0719..4750ea4 100644
--- a/docs/gitbook/regression/general.md
+++ b/docs/gitbook/regression/general.md
@@ -24,7 +24,7 @@ In our regression tutorials, you can tackle realistic 
prediction problems by usi
 - [AROW](e2006_arow.html#arow)
 - [AROWe](e2006_arow.html#arowe)
 
-Our `train_regression` function enables you to solve the regression problems 
with flexible configureable options. Let us try the function below.
+Our `train_regressor` function enables you to solve the regression problems 
with flexible configurable options. Let us try the function below.
 
 It should be noted that the sample queries require you to prepare [E2006-tfidf 
data](http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/regression.html#E2006-tfidf).
 See [our E2006-tfidf tutorial page](../regression/e2006_dataset.md) for 
further instructions.
 
@@ -42,7 +42,7 @@ select
        avg(weight) as weight
 from (
        select 
-       train_regression(features,target,'-loss squaredloss -opt AdaGrad -reg 
no') as (feature,weight)
+       train_regressor(features,target,'-loss squaredloss -opt AdaGrad -reg 
no') as (feature,weight)
   from 
     e2006tfidf_train_x3
 ) t 

http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/7205de1e/docs/gitbook/regression/kddcup12tr2_dataset.md
----------------------------------------------------------------------
diff --git a/docs/gitbook/regression/kddcup12tr2_dataset.md 
b/docs/gitbook/regression/kddcup12tr2_dataset.md
index c32958f..e4a541b 100644
--- a/docs/gitbook/regression/kddcup12tr2_dataset.md
+++ b/docs/gitbook/regression/kddcup12tr2_dataset.md
@@ -243,4 +243,4 @@ from
   testing2 
   LATERAL VIEW explode(features) t AS feature;
 ```
-_Caution: We recommend you to set "mapred.reduce.tasks" in the above example 
to partition the training_orcfile table into pieces._
\ No newline at end of file
+_Caution: We recommend you to set "mapred.reduce.tasks" in the above example 
to partition the training_orcfile table into pieces._

http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/7205de1e/docs/gitbook/regression/kddcup12tr2_lr.md
----------------------------------------------------------------------
diff --git a/docs/gitbook/regression/kddcup12tr2_lr.md 
b/docs/gitbook/regression/kddcup12tr2_lr.md
index 6db07ab..b9f8bdf 100644
--- a/docs/gitbook/regression/kddcup12tr2_lr.md
+++ b/docs/gitbook/regression/kddcup12tr2_lr.md
@@ -157,4 +157,4 @@ pypy scoreKDD.py KDD_Track2_solution.csv  pa_predict.submit
 |:-----------|------------:|
 | AUC  | 0.739722 |
 | NWMAE | 0.049582 |
-| WRMSE | 0.143698 |
\ No newline at end of file
+| WRMSE | 0.143698 |

http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/7205de1e/docs/gitbook/regression/kddcup12tr2_lr_amplify.md
----------------------------------------------------------------------
diff --git a/docs/gitbook/regression/kddcup12tr2_lr_amplify.md 
b/docs/gitbook/regression/kddcup12tr2_lr_amplify.md
index 5ede953..b363051 100644
--- a/docs/gitbook/regression/kddcup12tr2_lr_amplify.md
+++ b/docs/gitbook/regression/kddcup12tr2_lr_amplify.md
@@ -119,4 +119,4 @@ We recommend users to use *amplify()* for small training 
inputs and to use *rand
 |:-----------|--------------------|----:|
 | Plain | 89.718 | 0.734805 |
 | amplifier+clustered by | 479.855  | 0.746214 |
-| rand_amplifier | 116.424 | 0.743392 |
\ No newline at end of file
+| rand_amplifier | 116.424 | 0.743392 |

http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/7205de1e/docs/gitbook/tips/addbias.md
----------------------------------------------------------------------
diff --git a/docs/gitbook/tips/addbias.md b/docs/gitbook/tips/addbias.md
index 021ca64..75ef451 100644
--- a/docs/gitbook/tips/addbias.md
+++ b/docs/gitbook/tips/addbias.md
@@ -26,8 +26,8 @@ With bias clause b, a trainer learns the following f(x).
 _f(x)=Wx+b_ 
 Then, the predicted model considers bias existing in the dataset and the 
predicted hyperplane does not always cross the origin.
 
-**addBias()** of Hivemall, adds a bias to a feature vector. 
-To enable a bias clause, use addBias() for **both**_(important!)_ training and 
test data as follows.
+**add_bias()** of Hivemall, adds a bias to a feature vector. 
+To enable a bias clause, use add_bias() for **both**_(important!)_ training 
and test data as follows.
 The bias _b_ is a feature of "0" ("-1" in before v0.3) by the default. See 
[AddBiasUDF](../tips/addbias.html) for the detail.
 
 Note that Bias is expressed as a feature that found in all training/testing 
examples.
@@ -43,7 +43,7 @@ select
   -- extract_feature(feature) as feature, -- hivemall v0.3.1 or later
   -- extract_weight(feature) as value     -- hivemall v0.3.1 or later
 from 
-  e2006tfidf_test LATERAL VIEW explode(addBias(features)) t AS feature;
+  e2006tfidf_test LATERAL VIEW explode(add_bias(features)) t AS feature;
 ```
 
 # Adding a bias clause to training data
@@ -54,9 +54,9 @@ select
  avg(weight) as weight
 from 
  (select 
-     pa1a_regress(addBias(features),target) as (feature,weight)
+     pa1a_regress(add_bias(features),target) as (feature,weight)
   from 
      e2006tfidf_train_x3
  ) t 
 group by feature;
-```
\ No newline at end of file
+```

http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/7205de1e/docs/gitbook/tips/emr.md
----------------------------------------------------------------------
diff --git a/docs/gitbook/tips/emr.md b/docs/gitbook/tips/emr.md
index 049e6da..44e0855 100644
--- a/docs/gitbook/tips/emr.md
+++ b/docs/gitbook/tips/emr.md
@@ -107,7 +107,7 @@ select
   cast(split(feature,":")[0] as int) as feature,
   cast(split(feature,":")[1] as float) as value
 from 
-  news20b_test LATERAL VIEW explode(addBias(features)) t AS feature;
+  news20b_test LATERAL VIEW explode(add_bias(features)) t AS feature;
 ```
 
 ---
@@ -132,7 +132,7 @@ select
  cast(voted_avg(weight) as float) as weight
 from 
  (select 
-     train_arow(addBias(features),label) as (feature,weight)
+     train_arow(add_bias(features),label) as (feature,weight)
   from 
      news20b_train_x3
  ) t 
@@ -202,4 +202,4 @@ We recommended users to use m1.xlarge running Hivemall on 
EMR as follows.
    --bootstrap-name "install ganglia" \
  --availability-zone ap-northeast-1a
 ```
-Using spot instance for core/task instance groups is the best way to save your 
money.
\ No newline at end of file
+Using spot instance for core/task instance groups is the best way to save your 
money.

http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/7205de1e/docs/gitbook/tips/ensemble_learning.md
----------------------------------------------------------------------
diff --git a/docs/gitbook/tips/ensemble_learning.md 
b/docs/gitbook/tips/ensemble_learning.md
index 9288f84..2157a5b 100644
--- a/docs/gitbook/tips/ensemble_learning.md
+++ b/docs/gitbook/tips/ensemble_learning.md
@@ -49,20 +49,20 @@ select
  voted_avg(weight) as weight
 from 
  (select 
-     -- train_multiclass_cw(addBias(features),label) as (label,feature,weight) 
     -- hivemall v0.1
-     train_multiclass_cw(addBias(features),label) as 
(label,feature,weight,covar)   -- hivemall v0.2 or later
+     -- train_multiclass_cw(add_bias(features),label) as 
(label,feature,weight)      -- hivemall v0.1
+     train_multiclass_cw(add_bias(features),label) as 
(label,feature,weight,covar)   -- hivemall v0.2 or later
   from 
      news20mc_train_x3
   union all
   select 
-     -- train_multiclass_arow(addBias(features),label) as 
(label,feature,weight)    -- hivemall v0.1
-     train_multiclass_arow(addBias(features),label) as 
(label,feature,weight,covar) -- hivemall v0.2 or later
+     -- train_multiclass_arow(add_bias(features),label) as 
(label,feature,weight)    -- hivemall v0.1
+     train_multiclass_arow(add_bias(features),label) as 
(label,feature,weight,covar) -- hivemall v0.2 or later
   from 
      news20mc_train_x3
   union all
   select 
-     -- train_multiclass_scw(addBias(features),label) as 
(label,feature,weight)     -- hivemall v0.1
-     train_multiclass_scw(addBias(features),label) as 
(label,feature,weight,covar)  -- hivemall v0.2 or later
+     -- train_multiclass_scw(add_bias(features),label) as 
(label,feature,weight)     -- hivemall v0.1
+     train_multiclass_scw(add_bias(features),label) as 
(label,feature,weight,covar)  -- hivemall v0.2 or later
   from 
      news20mc_train_x3
  ) t 
@@ -196,4 +196,4 @@ Unfortunately, too many cooks spoil the broth in this case 
too :-(
 | SCW2 |  0.8482344102178813 |
 | Ensemble(model) | 0.8494866015527173 |
 | Ensemble(prediction) | 0.8499874780866516 |
-| CW |  0.850488354620586 |
\ No newline at end of file
+| CW |  0.850488354620586 |

http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/7205de1e/docs/gitbook/tips/hadoop_tuning.md
----------------------------------------------------------------------
diff --git a/docs/gitbook/tips/hadoop_tuning.md 
b/docs/gitbook/tips/hadoop_tuning.md
index 507e19d..c516820 100644
--- a/docs/gitbook/tips/hadoop_tuning.md
+++ b/docs/gitbook/tips/hadoop_tuning.md
@@ -97,4 +97,4 @@ You can use the plain old MapReduce by setting following 
setting:
 ```sql
 set mapreduce.framework.name=yarn;
 set hive.execution.engine=mr;
-```
\ No newline at end of file
+```

http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/7205de1e/docs/gitbook/tips/mixserver.md
----------------------------------------------------------------------
diff --git a/docs/gitbook/tips/mixserver.md b/docs/gitbook/tips/mixserver.md
index f9878e6..91aff87 100644
--- a/docs/gitbook/tips/mixserver.md
+++ b/docs/gitbook/tips/mixserver.md
@@ -69,7 +69,7 @@ select
  cast(voted_avg(weight) as float) as weight
 from 
  (select 
-     train_pa1(addBias(features),label,"-mix host01,host02,host03") as 
(feature,weight)
+     train_pa1(add_bias(features),label,"-mix host01,host02,host03") as 
(feature,weight)
   from 
      kdd10a_train_x3
  ) t 
@@ -83,4 +83,4 @@ The effect of model mixing
 
 In my experience, the MIX improved the prediction accuracy of the above 
KDD2010a PA1 training on a 32 nodes cluster from 0.844835019263103 (w/o mix) to 
0.8678096499719774 (w/ mix).
 
-The overhead of using the MIX protocol is *almost negligible* because the MIX 
communication is efficiently handled using asynchronous non-blocking I/O. 
Furthermore, the training time could be improved on certain settings because of 
the faster convergence due to mixing. 
\ No newline at end of file
+The overhead of using the MIX protocol is *almost negligible* because the MIX 
communication is efficiently handled using asynchronous non-blocking I/O. 
Furthermore, the training time could be improved on certain settings because of 
the faster convergence due to mixing. 

http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/7205de1e/docs/gitbook/tips/rand_amplify.md
----------------------------------------------------------------------
diff --git a/docs/gitbook/tips/rand_amplify.md 
b/docs/gitbook/tips/rand_amplify.md
index 6d68dea..73b1c3a 100644
--- a/docs/gitbook/tips/rand_amplify.md
+++ b/docs/gitbook/tips/rand_amplify.md
@@ -118,4 +118,4 @@ We recommend users to use *amplify()* for small training 
inputs and to use *rand
 |:-----------|--------------------|----:|
 | Plain | 89.718 | 0.734805 |
 | amplifier+clustered by | 479.855  | 0.746214 |
-| rand_amplifier | 116.424 | 0.743392 |
\ No newline at end of file
+| rand_amplifier | 116.424 | 0.743392 |

http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/7205de1e/docs/gitbook/tips/rt_prediction.md
----------------------------------------------------------------------
diff --git a/docs/gitbook/tips/rt_prediction.md 
b/docs/gitbook/tips/rt_prediction.md
index 96641a3..e1a1fff 100644
--- a/docs/gitbook/tips/rt_prediction.md
+++ b/docs/gitbook/tips/rt_prediction.md
@@ -135,7 +135,7 @@ select
   extract_feature(feature) as feature,
   extract_weight(feature) as value
 from
-  a9atest LATERAL VIEW explode(addBias(features)) t AS feature;
+  a9atest LATERAL VIEW explode(add_bias(features)) t AS feature;
 
 desc extended a9atest_exploded_tsv;
 > location:hdfs://dm01:9000/user/hive/warehouse/a9a.db/a9atest_exploded_tsv,
@@ -252,4 +252,4 @@ Alternatively, you can use SQL views for testing target 't' 
in the above query.
 | 0.05595205126313402 |       0.0 |
 +---------------------+-----------+
 1 row in set (0.00 sec)
-```
\ No newline at end of file
+```

http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/7205de1e/docs/gitbook/troubleshooting/asterisk.md
----------------------------------------------------------------------
diff --git a/docs/gitbook/troubleshooting/asterisk.md 
b/docs/gitbook/troubleshooting/asterisk.md
index 621ab3f..3c8c08b 100644
--- a/docs/gitbook/troubleshooting/asterisk.md
+++ b/docs/gitbook/troubleshooting/asterisk.md
@@ -19,4 +19,4 @@
         
 See [HIVE-4181](https://issues.apache.org/jira/browse/HIVE-4181) that asterisk 
argument without table alias for UDTF is not working. It has been fixed as part 
of Hive v0.12 release.
 
-A possible workaround is to use asterisk with a table alias, or to specify 
names of arguments explicitly.
\ No newline at end of file
+A possible workaround is to use asterisk with a table alias, or to specify 
names of arguments explicitly.

http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/7205de1e/docs/gitbook/troubleshooting/mapjoin_classcastex.md
----------------------------------------------------------------------
diff --git a/docs/gitbook/troubleshooting/mapjoin_classcastex.md 
b/docs/gitbook/troubleshooting/mapjoin_classcastex.md
index 28e7709..ade4f52 100644
--- a/docs/gitbook/troubleshooting/mapjoin_classcastex.md
+++ b/docs/gitbook/troubleshooting/mapjoin_classcastex.md
@@ -24,4 +24,4 @@ Map-side join on Tez causes 
[ClassCastException](http://markmail.org/message/7cw
 set hive.mapjoin.optimized.hashtable=false;
 ```
 
-Caution: Fixed in Hive 1.3.0. Refer 
[HIVE_11051](https://issues.apache.org/jira/browse/HIVE-11051) for the detail.
\ No newline at end of file
+Caution: Fixed in Hive 1.3.0. Refer 
[HIVE_11051](https://issues.apache.org/jira/browse/HIVE-11051) for the detail.

http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/7205de1e/docs/gitbook/troubleshooting/mapjoin_task_error.md
----------------------------------------------------------------------
diff --git a/docs/gitbook/troubleshooting/mapjoin_task_error.md 
b/docs/gitbook/troubleshooting/mapjoin_task_error.md
index 78b4e32..185378b 100644
--- a/docs/gitbook/troubleshooting/mapjoin_task_error.md
+++ b/docs/gitbook/troubleshooting/mapjoin_task_error.md
@@ -24,4 +24,4 @@ When using complex queries using views, the auto conversion 
sometimes throws Sem
 Workaround for the exception is to disable **hive.auto.convert.join** before 
the execution as follows.
 ```
 set hive.auto.convert.join=false;
-```
\ No newline at end of file
+```

http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/7205de1e/docs/gitbook/troubleshooting/num_mappers.md
----------------------------------------------------------------------
diff --git a/docs/gitbook/troubleshooting/num_mappers.md 
b/docs/gitbook/troubleshooting/num_mappers.md
index c1820db..67ce7b5 100644
--- a/docs/gitbook/troubleshooting/num_mappers.md
+++ b/docs/gitbook/troubleshooting/num_mappers.md
@@ -36,4 +36,4 @@ set hive.tez.input.format;
 You can then control the maximum number of mappers via setting:
 ```
 set mapreduce.job.maps=128;
-```
\ No newline at end of file
+```

http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/7205de1e/docs/gitbook/troubleshooting/oom.md
----------------------------------------------------------------------
diff --git a/docs/gitbook/troubleshooting/oom.md 
b/docs/gitbook/troubleshooting/oom.md
index 50bee25..dc375bf 100644
--- a/docs/gitbook/troubleshooting/oom.md
+++ b/docs/gitbook/troubleshooting/oom.md
@@ -36,4 +36,4 @@ If OOM caused during the merge step, try setting a larger 
**mapred.reduce.tasks*
 SET mapred.reduce.tasks=64;
 ```
 
-If your OOM happened by using amplify(), try using rand_amplify() instead.
\ No newline at end of file
+If your OOM happened by using amplify(), try using rand_amplify() instead.

http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/7205de1e/resources/ddl/define-all-as-permanent.hive
----------------------------------------------------------------------
diff --git a/resources/ddl/define-all-as-permanent.hive 
b/resources/ddl/define-all-as-permanent.hive
index c59678a..feb1a08 100644
--- a/resources/ddl/define-all-as-permanent.hive
+++ b/resources/ddl/define-all-as-permanent.hive
@@ -337,8 +337,8 @@ CREATE FUNCTION tf as 
'hivemall.ftvec.text.TermFrequencyUDAF' USING JAR '${hivem
 -- Regression functions --
 --------------------------
 
-DROP FUNCTION IF EXISTS train_regression;
-CREATE FUNCTION train_regression as 
'hivemall.regression.GeneralRegressionUDTF' USING JAR '${hivemall_jar}';
+DROP FUNCTION IF EXISTS train_regressor;
+CREATE FUNCTION train_regressor as 'hivemall.regression.GeneralRegressorUDTF' 
USING JAR '${hivemall_jar}';
 
 DROP FUNCTION IF EXISTS train_logregr;
 CREATE FUNCTION train_logregr as 'hivemall.regression.LogressUDTF' USING JAR 
'${hivemall_jar}';

http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/7205de1e/resources/ddl/define-all.hive
----------------------------------------------------------------------
diff --git a/resources/ddl/define-all.hive b/resources/ddl/define-all.hive
index 4514535..310f9f4 100644
--- a/resources/ddl/define-all.hive
+++ b/resources/ddl/define-all.hive
@@ -333,8 +333,8 @@ create temporary function tf as 
'hivemall.ftvec.text.TermFrequencyUDAF';
 -- Regression functions --
 --------------------------
 
-drop temporary function if exists train_regression;
-create temporary function train_regression as 
'hivemall.regression.GeneralRegressionUDTF';
+drop temporary function if exists train_regressor;
+create temporary function train_regressor as 
'hivemall.regression.GeneralRegressorUDTF';
 
 drop temporary function if exists logress;
 create temporary function logress as 'hivemall.regression.LogressUDTF';

http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/7205de1e/resources/ddl/define-all.spark
----------------------------------------------------------------------
diff --git a/resources/ddl/define-all.spark b/resources/ddl/define-all.spark
index 2cf4d60..42b235b 100644
--- a/resources/ddl/define-all.spark
+++ b/resources/ddl/define-all.spark
@@ -336,8 +336,8 @@ sqlContext.sql("CREATE TEMPORARY FUNCTION tf AS 
'hivemall.ftvec.text.TermFrequen
  * Regression functions
  */
 
-sqlContext.sql("DROP TEMPORARY FUNCTION IF EXISTS train_regression")
-sqlContext.sql("CREATE TEMPORARY FUNCTION train_regression AS 
'hivemall.regression.GeneralRegressionUDTF'")
+sqlContext.sql("DROP TEMPORARY FUNCTION IF EXISTS train_regressor")
+sqlContext.sql("CREATE TEMPORARY FUNCTION train_regressor AS 
'hivemall.regression.GeneralRegressorUDTF'")
 
 sqlContext.sql("DROP TEMPORARY FUNCTION IF EXISTS logress")
 sqlContext.sql("CREATE TEMPORARY FUNCTION logress AS 
'hivemall.regression.LogressUDTF'")

http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/7205de1e/resources/ddl/define-udfs.td.hql
----------------------------------------------------------------------
diff --git a/resources/ddl/define-udfs.td.hql b/resources/ddl/define-udfs.td.hql
index d1bdfa4..dd694e3 100644
--- a/resources/ddl/define-udfs.td.hql
+++ b/resources/ddl/define-udfs.td.hql
@@ -172,7 +172,7 @@ create temporary function haversine_distance as 
'hivemall.geospatial.HaversineDi
 create temporary function l2_norm as 'hivemall.tools.math.L2NormUDAF';
 create temporary function dimsum_mapper as 
'hivemall.knn.similarity.DIMSUMMapperUDTF';
 create temporary function train_classifier as 
'hivemall.classifier.GeneralClassifierUDTF';
-create temporary function train_regression as 
'hivemall.regression.GeneralRegressionUDTF';
+create temporary function train_regressor as 
'hivemall.regression.GeneralRegressorUDTF';
 create temporary function tree_export as 'hivemall.smile.tools.TreeExportUDF';
 
 -- NLP features

[1/2] incubator-hivemall git commit: Close #104: [HIVEMALL-101-2] Renamed train_regression to train_regressor

Reply via email to