This is an automated email from the ASF dual-hosted git repository.

myui pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/incubator-hivemall-site.git


The following commit(s) were added to refs/heads/asf-site by this push:
     new 26f41ed  Update entry about feature binning
26f41ed is described below

commit 26f41edc32f58b335f2798bbbca1237b41de893a
Author: Makoto Yui <[email protected]>
AuthorDate: Sat Jun 29 01:28:27 2019 +0900

    Update entry about feature binning
---
 userguide/ft_engineering/binning.html | 233 ++++++++++++++++++++++++----------
 userguide/misc/funcs.html             |  37 +++++-
 userguide/misc/generic_funcs.html     |   2 +-
 3 files changed, 204 insertions(+), 68 deletions(-)

diff --git a/userguide/ft_engineering/binning.html 
b/userguide/ft_engineering/binning.html
index 5d75620..1d4f235 100644
--- a/userguide/ft_engineering/binning.html
+++ b/userguide/ft_engineering/binning.html
@@ -2377,28 +2377,21 @@
   specific language governing permissions and limitations
   under the License.
 -->
-<p>Feature binning is a method of dividing quantitative variables into 
categorical values.
-It groups quantitative values into a pre-defined number of bins.</p>
-<p><em>Note: This feature is supported from Hivemall v0.5-rc.1 or 
later.</em></p>
+<p>Feature binning is a method of dividing quantitative variables into 
categorical values. It groups quantitative values into a pre-defined number of 
bins.</p>
+<p>If the number of bins is set to 3, the bin ranges become something like 
<code>[-Inf, 1], (1, 10], (10, Inf]</code>.</p>
 <!-- toc --><div id="toc" class="toc">
 
 <ul>
 <li><a href="#usage">Usage</a><ul>
-<li><a href="#a-feature-vector-trasformation-by-applying-feature-binning">A. 
Feature Vector trasformation by applying Feature Binning</a></li>
-<li><a href="#b-get-a-mapping-table-by-feature-binning">B. Get a mapping table 
by Feature Binning</a></li>
-</ul>
-</li>
-<li><a href="#function-signature">Function Signature</a><ul>
-<li><a href="#udaf-buildbinsweight-numofbins-autoshrink">[UDAF] 
<code>build_bins(weight, num_of_bins[, auto_shrink])</code></a><ul>
-<li><a href="#input">Input</a></li>
-<li><a href="#output">Output</a></li>
-</ul>
-</li>
-<li><a href="#udf-featurebinningfeatures-quantilesmapweight-quantiles">[UDF] 
<code>feature_binning(features, quantiles_map)/(weight, 
quantiles)</code></a><ul>
-<li><a href="#variation-a">Variation: A</a></li>
-<li><a href="#variation-b">Variation: B</a></li>
+<li><a 
href="#feature-vector-trasformation-by-applying-feature-binning">Feature Vector 
trasformation by applying Feature Binning</a></li>
+<li><a href="#practical-example">Practical Example</a></li>
+<li><a href="#get-a-mapping-table-by-feature-binning">Get a mapping table by 
Feature Binning</a></li>
 </ul>
 </li>
+<li><a href="#function-signatures">Function Signatures</a><ul>
+<li><a href="#udaf-buildbinsweight-numofbins--autoshrinkfalse">UDAF 
<code>build_bins(weight num_of_bins [, auto_shrink=false])</code></a></li>
+<li><a href="#udf-featurebinningfeatures-quantilesmap">UDF 
<code>feature_binning(features, quantiles_map)</code></a></li>
+<li><a href="#udf-featurebinningweight-quantiles">UDF 
<code>feature_binning(weight, quantiles)</code></a></li>
 </ul>
 </li>
 </ul>
@@ -2407,35 +2400,96 @@ It groups quantitative values into a pre-defined number 
of bins.</p>
 <h1 id="usage">Usage</h1>
 <p>Prepare sample data (<em>users</em> table) first as follows:</p>
 <pre><code class="lang-sql"><span class="hljs-keyword">CREATE</span> <span 
class="hljs-keyword">TABLE</span> <span class="hljs-keyword">users</span> (
-  <span class="hljs-keyword">name</span> <span 
class="hljs-keyword">string</span>, age <span class="hljs-built_in">int</span>, 
gender <span class="hljs-keyword">string</span>
+  <span class="hljs-keyword">rowid</span> <span 
class="hljs-built_in">int</span>, <span class="hljs-keyword">name</span> <span 
class="hljs-keyword">string</span>, age <span class="hljs-built_in">int</span>, 
gender <span class="hljs-keyword">string</span>
 );
-
 <span class="hljs-keyword">INSERT</span> <span 
class="hljs-keyword">INTO</span> <span class="hljs-keyword">users</span> <span 
class="hljs-keyword">VALUES</span>
-  (<span class="hljs-string">&apos;Jacob&apos;</span>, <span 
class="hljs-number">20</span>, <span 
class="hljs-string">&apos;Male&apos;</span>),
-  (<span class="hljs-string">&apos;Mason&apos;</span>, <span 
class="hljs-number">22</span>, <span 
class="hljs-string">&apos;Male&apos;</span>),
-  (<span class="hljs-string">&apos;Sophia&apos;</span>, <span 
class="hljs-number">35</span>, <span 
class="hljs-string">&apos;Female&apos;</span>),
-  (<span class="hljs-string">&apos;Ethan&apos;</span>, <span 
class="hljs-number">55</span>, <span 
class="hljs-string">&apos;Male&apos;</span>),
-  (<span class="hljs-string">&apos;Emma&apos;</span>, <span 
class="hljs-number">15</span>, <span 
class="hljs-string">&apos;Female&apos;</span>),
-  (<span class="hljs-string">&apos;Noah&apos;</span>, <span 
class="hljs-number">46</span>, <span 
class="hljs-string">&apos;Male&apos;</span>),
-  (<span class="hljs-string">&apos;Isabella&apos;</span>, <span 
class="hljs-number">20</span>, <span 
class="hljs-string">&apos;Female&apos;</span>);
+  (<span class="hljs-number">1</span>, <span 
class="hljs-string">&apos;Jacob&apos;</span>, <span 
class="hljs-number">20</span>, <span 
class="hljs-string">&apos;Male&apos;</span>),
+  (<span class="hljs-number">2</span>, <span 
class="hljs-string">&apos;Mason&apos;</span>, <span 
class="hljs-number">22</span>, <span 
class="hljs-string">&apos;Male&apos;</span>),
+  (<span class="hljs-number">3</span>, <span 
class="hljs-string">&apos;Sophia&apos;</span>, <span 
class="hljs-number">35</span>, <span 
class="hljs-string">&apos;Female&apos;</span>),
+  (<span class="hljs-number">4</span>, <span 
class="hljs-string">&apos;Ethan&apos;</span>, <span 
class="hljs-number">55</span>, <span 
class="hljs-string">&apos;Male&apos;</span>),
+  (<span class="hljs-number">5</span>, <span 
class="hljs-string">&apos;Emma&apos;</span>, <span 
class="hljs-number">15</span>, <span 
class="hljs-string">&apos;Female&apos;</span>),
+  (<span class="hljs-number">6</span>, <span 
class="hljs-string">&apos;Noah&apos;</span>, <span 
class="hljs-number">46</span>, <span 
class="hljs-string">&apos;Male&apos;</span>),
+  (<span class="hljs-number">7</span>, <span 
class="hljs-string">&apos;Isabella&apos;</span>, <span 
class="hljs-number">20</span>, <span 
class="hljs-string">&apos;Female&apos;</span>)
+;
+
+<span class="hljs-keyword">CREATE</span> <span 
class="hljs-keyword">TABLE</span> <span class="hljs-keyword">input</span> <span 
class="hljs-keyword">as</span>
+<span class="hljs-keyword">SELECT</span>
+  <span class="hljs-keyword">rowid</span>,
+  array_concat(
+    categorical_features(
+      <span class="hljs-built_in">array</span>(<span 
class="hljs-string">&apos;name&apos;</span>, <span 
class="hljs-string">&apos;gender&apos;</span>),
+      <span class="hljs-keyword">name</span>, gender
+    ),
+    quantitative_features(
+      <span class="hljs-built_in">array</span>(<span 
class="hljs-string">&apos;age&apos;</span>),
+      age
+    )
+  ) <span class="hljs-keyword">AS</span> features
+<span class="hljs-keyword">FROM</span>
+  <span class="hljs-keyword">users</span>;
+
+<span class="hljs-keyword">select</span> * <span 
class="hljs-keyword">from</span> <span class="hljs-keyword">input</span> <span 
class="hljs-keyword">limit</span> <span class="hljs-number">2</span>;
 </code></pre>
-<h2 id="a-feature-vector-trasformation-by-applying-feature-binning">A. Feature 
Vector trasformation by applying Feature Binning</h2>
-<pre><code class="lang-sql">WITH t AS (
+<table>
+<thead>
+<tr>
+<th style="text-align:left">input.rowid</th>
+<th style="text-align:left">input.features</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td style="text-align:left">1</td>
+<td 
style="text-align:left">[&quot;name#Jacob&quot;,&quot;gender#Male&quot;,&quot;age:20.0&quot;]</td>
+</tr>
+<tr>
+<td style="text-align:left">2</td>
+<td 
style="text-align:left">[&quot;name#Mason&quot;,&quot;gender#Male&quot;,&quot;age:22.0&quot;]</td>
+</tr>
+</tbody>
+</table>
+<h2 id="feature-vector-trasformation-by-applying-feature-binning">Feature 
Vector trasformation by applying Feature Binning</h2>
+<p>Now, converting <code>age</code> values into 3 bins.</p>
+<pre><code class="lang-sql"><span class="hljs-keyword">SELECT</span>
+  <span class="hljs-keyword">map</span>(<span 
class="hljs-string">&apos;age&apos;</span>, build_bins(age, <span 
class="hljs-number">3</span>)) <span class="hljs-keyword">AS</span> 
quantiles_map
+<span class="hljs-keyword">FROM</span>
+  <span class="hljs-keyword">users</span>
+</code></pre>
+<blockquote>
+<p>{&quot;age&quot;:[-Infinity,18.333333333333332,30.666666666666657,Infinity]}</p>
+</blockquote>
+<p>In the above query result, you can find 4 values for age in 
<code>quantiles_map</code>. It&apos;s a threshold of 3 bins. </p>
+<pre><code class="lang-sql">WITH bins as (
   <span class="hljs-keyword">SELECT</span>
-    array_concat(
-      categorical_features(
-        <span class="hljs-built_in">array</span>(<span 
class="hljs-string">&apos;name&apos;</span>, <span 
class="hljs-string">&apos;gender&apos;</span>),
-    <span class="hljs-keyword">name</span>, gender
-      ),
-      quantitative_features(
-    <span class="hljs-built_in">array</span>(<span 
class="hljs-string">&apos;age&apos;</span>),
-    age
-      )
-    ) <span class="hljs-keyword">AS</span> features
+    <span class="hljs-keyword">map</span>(<span 
class="hljs-string">&apos;age&apos;</span>, build_bins(age, <span 
class="hljs-number">3</span>)) <span class="hljs-keyword">AS</span> 
quantiles_map
   <span class="hljs-keyword">FROM</span>
     <span class="hljs-keyword">users</span>
-),
-bins <span class="hljs-keyword">AS</span> (
+)
+<span class="hljs-keyword">select</span>
+  feature_binning(
+    <span class="hljs-built_in">array</span>(<span 
class="hljs-string">&apos;age:-Infinity&apos;</span>, <span 
class="hljs-string">&apos;age:-1&apos;</span>, <span 
class="hljs-string">&apos;age:0&apos;</span>, <span 
class="hljs-string">&apos;age:1&apos;</span>, <span 
class="hljs-string">&apos;age:18.333333333333331&apos;</span>, <span 
class="hljs-string">&apos;age:18.333333333333332&apos;</span>), quantiles_map
+  ),
+  feature_binning(
+    <span class="hljs-built_in">array</span>(<span 
class="hljs-string">&apos;age:18.3333333333333333&apos;</span>, <span 
class="hljs-string">&apos;age:18.33333333333334&apos;</span>, <span 
class="hljs-string">&apos;age:19&apos;</span>, <span 
class="hljs-string">&apos;age:30&apos;</span>, <span 
class="hljs-string">&apos;age:30.666666666666656&apos;</span>, <span 
class="hljs-string">&apos;age:30.666666666666657&apos;</span>), quantiles_map
+  ),
+  feature_binning(
+    <span class="hljs-built_in">array</span>(<span 
class="hljs-string">&apos;age:666666666666658&apos;</span>, <span 
class="hljs-string">&apos;age:30.66666666666666&apos;</span>, <span 
class="hljs-string">&apos;age:31&apos;</span>, <span 
class="hljs-string">&apos;age:99&apos;</span>, <span 
class="hljs-string">&apos;age:Infinity&apos;</span>), quantiles_map
+  ),
+  feature_binning(
+    <span class="hljs-built_in">array</span>(<span 
class="hljs-string">&apos;age:NaN&apos;</span>), quantiles_map
+  ),
+  feature_binning( <span class="hljs-comment">-- not in map</span>
+    <span class="hljs-built_in">array</span>(<span 
class="hljs-string">&apos;weight:60.3&apos;</span>), quantiles_map
+  )
+<span class="hljs-keyword">from</span>
+  bins
+</code></pre>
+<blockquote>
+<p>[&quot;age:0&quot;,&quot;age:0&quot;,&quot;age:0&quot;,&quot;age:0&quot;,&quot;age:0&quot;,&quot;age:0&quot;]
       
[&quot;age:0&quot;,&quot;age:1&quot;,&quot;age:1&quot;,&quot;age:1&quot;,&quot;age:1&quot;,&quot;age:1&quot;]
       [&quot;age:2&quot;,&quot;a
+ge:2&quot;,&quot;age:2&quot;,&quot;age:2&quot;,&quot;age:2&quot;]  
[&quot;age:3&quot;]       [&quot;weight:60.3&quot;]</p>
+</blockquote>
+<p>The following query shows more practical usage:</p>
+<pre><code class="lang-sql">WITH bins AS (
   <span class="hljs-keyword">SELECT</span>
     <span class="hljs-keyword">map</span>(<span 
class="hljs-string">&apos;age&apos;</span>, build_bins(age, <span 
class="hljs-number">3</span>)) <span class="hljs-keyword">AS</span> 
quantiles_map
   <span class="hljs-keyword">FROM</span>
@@ -2444,40 +2498,91 @@ bins <span class="hljs-keyword">AS</span> (
 <span class="hljs-keyword">SELECT</span>
   feature_binning(features, quantiles_map) <span 
class="hljs-keyword">AS</span> features
 <span class="hljs-keyword">FROM</span>
-  t <span class="hljs-keyword">CROSS</span> <span 
class="hljs-keyword">JOIN</span> bins;
+  <span class="hljs-keyword">input</span>
+  <span class="hljs-keyword">CROSS</span> <span 
class="hljs-keyword">JOIN</span> bins;
 </code></pre>
-<p><em>Result</em></p>
 <table>
 <thead>
 <tr>
-<th style="text-align:center">features: 
<code>array&lt;features::string&gt;</code></th>
+<th style="text-align:left">features: 
<code>array&lt;features::string&gt;</code></th>
 </tr>
 </thead>
 <tbody>
 <tr>
-<td 
style="text-align:center">[&quot;name#Jacob&quot;,&quot;gender#Male&quot;,&quot;age:1&quot;]</td>
+<td 
style="text-align:left">[&quot;name#Jacob&quot;,&quot;gender#Male&quot;,&quot;age:1&quot;]</td>
 </tr>
 <tr>
-<td 
style="text-align:center">[&quot;name#Mason&quot;,&quot;gender#Male&quot;,&quot;age:1&quot;]</td>
+<td 
style="text-align:left">[&quot;name#Mason&quot;,&quot;gender#Male&quot;,&quot;age:1&quot;]</td>
 </tr>
 <tr>
-<td 
style="text-align:center">[&quot;name#Sophia&quot;,&quot;gender#Female&quot;,&quot;age:2&quot;]</td>
+<td 
style="text-align:left">[&quot;name#Sophia&quot;,&quot;gender#Female&quot;,&quot;age:2&quot;]</td>
 </tr>
 <tr>
-<td 
style="text-align:center">[&quot;name#Ethan&quot;,&quot;gender#Male&quot;,&quot;age:2&quot;]</td>
+<td 
style="text-align:left">[&quot;name#Ethan&quot;,&quot;gender#Male&quot;,&quot;age:2&quot;]</td>
+</tr>
+<tr>
+<td style="text-align:left">...</td>
+</tr>
+</tbody>
+</table>
+<h2 id="practical-example">Practical Example</h2>
+<p>Here, we show a more practical usage of <code>feature_binning</code> UDF 
that applied feature binning for given feature vectors.</p>
+<pre><code class="lang-sql">WITH extracted as (
+  <span class="hljs-keyword">select</span> 
+    extract_feature(feature) <span class="hljs-keyword">as</span> <span 
class="hljs-keyword">index</span>,
+    extract_weight(feature) <span class="hljs-keyword">as</span> <span 
class="hljs-keyword">value</span>
+  <span class="hljs-keyword">from</span>
+    <span class="hljs-keyword">input</span> l
+    LATERAL <span class="hljs-keyword">VIEW</span> explode(features) r <span 
class="hljs-keyword">as</span> feature
+  <span class="hljs-keyword">where</span>
+    <span class="hljs-keyword">instr</span>(feature, <span 
class="hljs-string">&apos;:&apos;</span>) &gt; <span 
class="hljs-number">0</span> <span class="hljs-comment">-- filter out 
categorical features</span>
+),
+<span class="hljs-keyword">mapping</span> <span class="hljs-keyword">as</span> 
(
+  <span class="hljs-keyword">select</span>
+    <span class="hljs-keyword">index</span>, 
+    build_bins(<span class="hljs-keyword">value</span>, <span 
class="hljs-number">5</span>, <span class="hljs-literal">true</span>) <span 
class="hljs-keyword">as</span> quantiles <span class="hljs-comment">-- 5 bins 
with auto bin shrinking</span>
+  <span class="hljs-keyword">from</span>
+    extracted
+  <span class="hljs-keyword">group</span> <span class="hljs-keyword">by</span>
+    <span class="hljs-keyword">index</span>
+),
+bins <span class="hljs-keyword">as</span> (
+   <span class="hljs-keyword">select</span> 
+    to_map(<span class="hljs-keyword">index</span>, quantiles) <span 
class="hljs-keyword">as</span> quantiles 
+   <span class="hljs-keyword">from</span>
+    <span class="hljs-keyword">mapping</span>
+)
+<span class="hljs-keyword">select</span>
+  l.features <span class="hljs-keyword">as</span> original,
+  feature_binning(l.features, r.quantiles) <span 
class="hljs-keyword">as</span> features
+<span class="hljs-keyword">from</span>
+  <span class="hljs-keyword">input</span> l
+  <span class="hljs-keyword">cross</span> <span 
class="hljs-keyword">join</span> bins r
+<span class="hljs-comment">-- limit 10;</span>
+</code></pre>
+<table>
+<thead>
+<tr>
+<th style="text-align:left">original</th>
+<th style="text-align:left">features</th>
 </tr>
+</thead>
+<tbody>
 <tr>
-<td 
style="text-align:center">[&quot;name#Emma&quot;,&quot;gender#Female&quot;,&quot;age:0&quot;]</td>
+<td 
style="text-align:left">[&quot;name#Jacob&quot;,&quot;gender#Male&quot;,&quot;age:20.0&quot;]</td>
+<td 
style="text-align:left">[&quot;name#Jacob&quot;,&quot;gender#Male&quot;,&quot;age:2&quot;]</td>
 </tr>
 <tr>
-<td 
style="text-align:center">[&quot;name#Noah&quot;,&quot;gender#Male&quot;,&quot;age:2&quot;]</td>
+<td 
style="text-align:left">[&quot;name#Isabella&quot;,&quot;gender#Female&quot;,&quot;age:20.0&quot;]</td>
+<td 
style="text-align:left">[&quot;name#Isabella&quot;,&quot;gender#Female&quot;,&quot;age:2&quot;]</td>
 </tr>
 <tr>
-<td 
style="text-align:center">[&quot;name#Isabella&quot;,&quot;gender#Female&quot;,&quot;age:1&quot;]</td>
+<td style="text-align:left">...</td>
+<td style="text-align:left">...</td>
 </tr>
 </tbody>
 </table>
-<h2 id="b-get-a-mapping-table-by-feature-binning">B. Get a mapping table by 
Feature Binning</h2>
+<h2 id="get-a-mapping-table-by-feature-binning">Get a mapping table by Feature 
Binning</h2>
 <pre><code class="lang-sql">WITH bins AS (
   <span class="hljs-keyword">SELECT</span> build_bins(age, <span 
class="hljs-number">3</span>) <span class="hljs-keyword">AS</span> quantiles
   <span class="hljs-keyword">FROM</span> <span 
class="hljs-keyword">users</span>
@@ -2487,7 +2592,6 @@ bins <span class="hljs-keyword">AS</span> (
 <span class="hljs-keyword">FROM</span>
   <span class="hljs-keyword">users</span> <span 
class="hljs-keyword">CROSS</span> <span class="hljs-keyword">JOIN</span> bins;
 </code></pre>
-<p><em>Result</em></p>
 <table>
 <thead>
 <tr>
@@ -2526,9 +2630,9 @@ bins <span class="hljs-keyword">AS</span> (
 </tr>
 </tbody>
 </table>
-<h1 id="function-signature">Function Signature</h1>
-<h2 id="udaf-buildbinsweight-numofbins-autoshrink">[UDAF] 
<code>build_bins(weight, num_of_bins[, auto_shrink])</code></h2>
-<h3 id="input">Input</h3>
+<h1 id="function-signatures">Function Signatures</h1>
+<h3 id="udaf-buildbinsweight-numofbins--autoshrinkfalse">UDAF 
<code>build_bins(weight num_of_bins [, auto_shrink=false])</code></h3>
+<h4 id="input">Input</h4>
 <table>
 <thead>
 <tr>
@@ -2540,12 +2644,12 @@ bins <span class="hljs-keyword">AS</span> (
 <tbody>
 <tr>
 <td style="text-align:center">weight</td>
-<td style="text-align:center">2 &lt;=</td>
+<td style="text-align:center">greather than or equals to 2</td>
 <td style="text-align:center">behavior when separations are repeated: 
T=&gt;skip, F=&gt;exception</td>
 </tr>
 </tbody>
 </table>
-<h3 id="output">Output</h3>
+<h4 id="output">Output</h4>
 <table>
 <thead>
 <tr>
@@ -2554,14 +2658,13 @@ bins <span class="hljs-keyword">AS</span> (
 </thead>
 <tbody>
 <tr>
-<td style="text-align:center">array of separation value</td>
+<td style="text-align:center">thresholds of bins based on quantiles</td>
 </tr>
 </tbody>
 </table>
 <div class="panel panel-primary"><div class="panel-heading"><h3 
class="panel-title" id="note"><i class="fa fa-edit"></i> Note</h3></div><div 
class="panel-body"><p>There is the possibility quantiles are repeated because 
of too many <code>num_of_bins</code> or too few data.
-If <code>auto_shrink</code> is true, skip duplicated quantiles. If not, throw 
an exception.</p></div></div>
-<h2 id="udf-featurebinningfeatures-quantilesmapweight-quantiles">[UDF] 
<code>feature_binning(features, quantiles_map)/(weight, quantiles)</code></h2>
-<h3 id="variation-a">Variation: A</h3>
+If <code>auto_shrink</code> is set to true, skip duplicated quantiles. If not, 
throw an exception.</p></div></div>
+<h3 id="udf-featurebinningfeatures-quantilesmap">UDF 
<code>feature_binning(features, quantiles_map)</code></h3>
 <h4 id="input">Input</h4>
 <table>
 <thead>
@@ -2572,8 +2675,8 @@ If <code>auto_shrink</code> is true, skip duplicated 
quantiles. If not, throw an
 </thead>
 <tbody>
 <tr>
-<td style="text-align:center">serialized feature</td>
-<td style="text-align:center">entry:: key: col name, val: quantiles</td>
+<td style="text-align:center">feature vector</td>
+<td style="text-align:center">a map where key=column name and 
value=quantiles</td>
 </tr>
 </tbody>
 </table>
@@ -2586,11 +2689,11 @@ If <code>auto_shrink</code> is true, skip duplicated 
quantiles. If not, throw an
 </thead>
 <tbody>
 <tr>
-<td style="text-align:center">serialized and binned features</td>
+<td style="text-align:center">binned features</td>
 </tr>
 </tbody>
 </table>
-<h3 id="variation-b">Variation: B</h3>
+<h3 id="udf-featurebinningweight-quantiles">UDF <code>feature_binning(weight, 
quantiles)</code></h3>
 <h4 id="input">Input</h4>
 <table>
 <thead>
@@ -2674,7 +2777,7 @@ Apache Hivemall is an effort undergoing incubation at The 
Apache Software Founda
     <script>
         var gitbook = gitbook || [];
         gitbook.push(function() {
-            gitbook.page.hasChanged({"page":{"title":"Feature 
Binning","level":"3.4","depth":1,"next":{"title":"Feature 
Paring","level":"3.5","depth":1,"path":"ft_engineering/pairing.md","ref":"ft_engineering/pairing.md","articles":[{"title":"Polynomial
 
features","level":"3.5.1","depth":2,"path":"ft_engineering/polynomial.md","ref":"ft_engineering/polynomial.md","articles":[]}]},"previous":{"title":"Feature
 
Selection","level":"3.3","depth":1,"path":"ft_engineering/selection.md","ref":"ft
 [...]
+            gitbook.page.hasChanged({"page":{"title":"Feature 
Binning","level":"3.4","depth":1,"next":{"title":"Feature 
Paring","level":"3.5","depth":1,"path":"ft_engineering/pairing.md","ref":"ft_engineering/pairing.md","articles":[{"title":"Polynomial
 
features","level":"3.5.1","depth":2,"path":"ft_engineering/polynomial.md","ref":"ft_engineering/polynomial.md","articles":[]}]},"previous":{"title":"Feature
 
Selection","level":"3.3","depth":1,"path":"ft_engineering/selection.md","ref":"ft
 [...]
         });
     </script>
 </div>
diff --git a/userguide/misc/funcs.html b/userguide/misc/funcs.html
index a77222d..74adf17 100644
--- a/userguide/misc/funcs.html
+++ b/userguide/misc/funcs.html
@@ -2628,7 +2628,40 @@ Reference: <a 
href="https://papers.nips.cc/paper/3848-adaptive-regularization-of
 <ul>
 <li><p><code>build_bins(number weight, const int num_of_bins[, const boolean 
auto_shrink = false])</code> - Return quantiles representing bins: 
array&lt;double&gt;</p>
 </li>
-<li><p><code>feature_binning(array&lt;features::string&gt; features, const 
map&lt;string, array&lt;number&gt;&gt; quantiles_map)</code> / 
<em>FUNC</em>(number weight, const array&lt;number&gt; quantiles) - Returns 
binned features as an array&lt;features::string&gt; / bin ID as int</p>
+<li><p><code>feature_binning(array&lt;features::string&gt; features, 
map&lt;string, array&lt;number&gt;&gt; quantiles_map)</code> - returns a binned 
feature vector as an array&lt;features::string&gt; <em>FUNC</em>(number weight, 
array&lt;number&gt; quantiles) - returns bin ID as int</p>
+<pre><code class="lang-sql">WITH extracted as (
+  <span class="hljs-keyword">select</span> 
+    extract_feature(feature) <span class="hljs-keyword">as</span> <span 
class="hljs-keyword">index</span>,
+    extract_weight(feature) <span class="hljs-keyword">as</span> <span 
class="hljs-keyword">value</span>
+  <span class="hljs-keyword">from</span>
+    <span class="hljs-keyword">input</span> l
+    LATERAL <span class="hljs-keyword">VIEW</span> explode(features) r <span 
class="hljs-keyword">as</span> feature
+),
+<span class="hljs-keyword">mapping</span> <span class="hljs-keyword">as</span> 
(
+  <span class="hljs-keyword">select</span>
+    <span class="hljs-keyword">index</span>, 
+    build_bins(<span class="hljs-keyword">value</span>, <span 
class="hljs-number">5</span>, <span class="hljs-literal">true</span>) <span 
class="hljs-keyword">as</span> quantiles <span class="hljs-comment">-- 5 bins 
with auto bin shrinking</span>
+  <span class="hljs-keyword">from</span>
+    extracted
+  <span class="hljs-keyword">group</span> <span class="hljs-keyword">by</span>
+    <span class="hljs-keyword">index</span>
+),
+bins <span class="hljs-keyword">as</span> (
+   <span class="hljs-keyword">select</span> 
+    to_map(<span class="hljs-keyword">index</span>, quantiles) <span 
class="hljs-keyword">as</span> quantiles 
+   <span class="hljs-keyword">from</span>
+    <span class="hljs-keyword">mapping</span>
+)
+<span class="hljs-keyword">select</span>
+  l.features <span class="hljs-keyword">as</span> original,
+  feature_binning(l.features, r.quantiles) <span 
class="hljs-keyword">as</span> features
+<span class="hljs-keyword">from</span>
+  <span class="hljs-keyword">input</span> l
+  <span class="hljs-keyword">cross</span> <span 
class="hljs-keyword">join</span> bins r
+
+&gt; [<span class="hljs-string">&quot;name#Jacob&quot;</span>,<span 
class="hljs-string">&quot;gender#Male&quot;</span>,<span 
class="hljs-string">&quot;age:20.0&quot;</span>] [<span 
class="hljs-string">&quot;name#Jacob&quot;</span>,<span 
class="hljs-string">&quot;gender#Male&quot;</span>,<span 
class="hljs-string">&quot;age:2&quot;</span>]
+&gt; [<span class="hljs-string">&quot;name#Isabella&quot;</span>,<span 
class="hljs-string">&quot;gender#Female&quot;</span>,<span 
class="hljs-string">&quot;age:20.0&quot;</span>]    [<span 
class="hljs-string">&quot;name#Isabella&quot;</span>,<span 
class="hljs-string">&quot;gender#Female&quot;</span>,<span 
class="hljs-string">&quot;age:2&quot;</span>]
+</code></pre>
 </li>
 </ul>
 <h2 id="feature-format-conversion">Feature format conversion</h2>
@@ -3024,7 +3057,7 @@ Apache Hivemall is an effort undergoing incubation at The 
Apache Software Founda
     <script>
         var gitbook = gitbook || [];
         gitbook.push(function() {
-            gitbook.page.hasChanged({"page":{"title":"List of 
Functions","level":"1.3","depth":1,"next":{"title":"Tips for Effective 
Hivemall","level":"1.4","depth":1,"path":"tips/README.md","ref":"tips/README.md","articles":[{"title":"Explicit
 add_bias() for better 
prediction","level":"1.4.1","depth":2,"path":"tips/addbias.md","ref":"tips/addbias.md","articles":[]},{"title":"Use
 rand_amplify() to better prediction 
results","level":"1.4.2","depth":2,"path":"tips/rand_amplify.md","ref":"t [...]
+            gitbook.page.hasChanged({"page":{"title":"List of 
Functions","level":"1.3","depth":1,"next":{"title":"Tips for Effective 
Hivemall","level":"1.4","depth":1,"path":"tips/README.md","ref":"tips/README.md","articles":[{"title":"Explicit
 add_bias() for better 
prediction","level":"1.4.1","depth":2,"path":"tips/addbias.md","ref":"tips/addbias.md","articles":[]},{"title":"Use
 rand_amplify() to better prediction 
results","level":"1.4.2","depth":2,"path":"tips/rand_amplify.md","ref":"t [...]
         });
     </script>
 </div>
diff --git a/userguide/misc/generic_funcs.html 
b/userguide/misc/generic_funcs.html
index a5fbe95..8246823 100644
--- a/userguide/misc/generic_funcs.html
+++ b/userguide/misc/generic_funcs.html
@@ -3183,7 +3183,7 @@ Apache Hivemall is an effort undergoing incubation at The 
Apache Software Founda
     <script>
         var gitbook = gitbook || [];
         gitbook.push(function() {
-            gitbook.page.hasChanged({"page":{"title":"List of Generic Hivemall 
Functions","level":"2.1","depth":1,"next":{"title":"Efficient Top-K Query 
Processing","level":"2.2","depth":1,"path":"misc/topk.md","ref":"misc/topk.md","articles":[]},"previous":{"title":"Map-side
 join causes ClassCastException on 
Tez","level":"1.6.5","depth":2,"path":"troubleshooting/mapjoin_classcastex.md","ref":"troubleshooting/mapjoin_classcastex.md","articles":[]},"dir":"ltr"},"config":{"plugins":["theme
 [...]
+            gitbook.page.hasChanged({"page":{"title":"List of Generic Hivemall 
Functions","level":"2.1","depth":1,"next":{"title":"Efficient Top-K Query 
Processing","level":"2.2","depth":1,"path":"misc/topk.md","ref":"misc/topk.md","articles":[]},"previous":{"title":"Map-side
 join causes ClassCastException on 
Tez","level":"1.6.5","depth":2,"path":"troubleshooting/mapjoin_classcastex.md","ref":"troubleshooting/mapjoin_classcastex.md","articles":[]},"dir":"ltr"},"config":{"plugins":["theme
 [...]
         });
     </script>
 </div>

Reply via email to