Github user chezou commented on a diff in the pull request:
https://github.com/apache/incubator-hivemall/pull/158#discussion_r214233539
--- Diff: docs/gitbook/supervised_learning/tutorial.md ---
@@ -0,0 +1,461 @@
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+-->
+
+# Step-by-Step Tutorial on Supervised Learning with Apache Hivemall
+
+<!-- toc -->
+
+## What is Hivemall?
+
+[Apache Hive](https://hive.apache.org/) is a data warehousing solution
that enables us to process large-scale data in the form of SQL easily. Assume
that you have a table named `purchase_history` which can be artificially
created as:
+
+```sql
+create table if not exists purchase_history as
+select 1 as id, "Saturday" as day_of_week, "male" as gender, 600 as price,
"book" as category, 1 as label
+union all
+select 2 as id, "Friday" as day_of_week, "female" as gender, 4800 as
price, "sports" as category, 0 as label
+union all
+select 3 as id, "Friday" as day_of_week, "other" as gender, 18000 as
price, "entertainment" as category, 0 as label
+union all
+select 4 as id, "Thursday" as day_of_week, "male" as gender, 200 as price,
"food" as category, 0 as label
+union all
+select 5 as id, "Wednesday" as day_of_week, "female" as gender, 1000 as
price, "electronics" as category, 1 as label
+;
+```
+
+The syntax of Hive queries, namely **HiveQL**, is very similar to SQL:
+
+```sql
+select count(1) from purchase_history;
+```
+
+> 5
+
+[Apache Hivemall](https://github.com/apache/incubator-hivemall) is a
collection of user-defined functions (UDFs) for HiveQL which is strongly
optimized for machine learning (ML) and data science. To give an example, you
can efficiently build a logistic regression model with the stochastic gradient
descent (SGD) optimization by issuing the following ~10 lines of query:
+
+```sql
+SELECT
+ train_classifier(
+ features,
+ label,
+ '-loss_function logloss -optimizer SGD'
+ ) as (feature, weight)
+FROM
+ training
+;
+```
+
+
+Hivemall function [`hivemall_version()`](../misc/funcs.html#others) shows
current Hivemall version, for example:
+
+```sql
+select hivemall_version();
+```
+
+> "0.5.1-incubating-SNAPSHOT"
+
+Below we list ML and relevant problems that Hivemall can solve:
+
+- [Binary and multi-class classification](../binaryclass/general.html)
+- [Regression](../regression/general.html)
+- [Recommendation](../recommend/cf.html)
+- [Anomaly detection](../anomaly/lof.html)
+- [Natural language processing](../misc/tokenizer.html)
+- [Clustering](../misc/tokenizer.html) (i.e., topic modeling)
+- [Data sketching](../misc/funcs.html#sketching)
+- Evaluation
+
+Our [YouTube demo video](https://www.youtube.com/watch?v=cMUsuA9KZ_c)
would be helpful to understand more about an overview of Hivemall.
+
+This tutorial explains the basic usage of Hivemall with examples of
supervised learning of simple regressor and binary classifier.
+
+## Binary classification
+
+Imagine a scenario that we like to build a binary classifier from the mock
`purchase_history` data and predict unforeseen purchases to conduct a new
campaign effectively:
+
+| day\_of\_week | gender | price | category | label |
+|:---:|:---:|:---:|:---:|:---|
+|Saturday | male | 600 | book | 1 |
+|Friday | female | 4800 | sports | 0 |
+|Friday | other | 18000 | entertainment | 0 |
+|Thursday | male | 200 | food | 0 |
+|Wednesday | female | 1000 | electronics | 1 |
+
+Use Hivemall
[`train_classifier()`](../misc/funcs.html#binary-classification) UDF to tackle
the problem as follows.
+
+### Step 1. Feature representation
+
+First of all, we have to convert the records into pairs of the feature
vector and corresponding target value. Here, Hivemall requires you to represent
input features in a specific format.
+
+To be more precise, Hivemall represents single feature in a concatenation
of **index** (i.e., **name**) and its **value**:
+
+- Quantitative feature: `<index>:<value>`
+ - e.g., `price:600.0`
+- Categorical feature: `<index>#<value>`
+ - e.g., `gender#male`
+
--- End diff --
Added 0f593c4
---