[SYSTEMML-1927] New frame transformcolmap builtin function 

Our decision tree script requires dummy coded inputs of all categorical
attributes as well as a mapping matrix of their column positions. This
patch introduces a new frame builtin function that leverages the
transformencode meta data in order to automatically compute the mapping
matrix. Since this is a meta data operation, it is only implemented in
CP (but it requires the meta data frame to fit into memory).
Furthermore, this also includes a respective test case and the
documentation update.

This changes enables a future simplification of decision tree and random
forest, where we could do any pre-processing automatically inside the
script instead of requiring the user to do it manually.


Project: http://git-wip-us.apache.org/repos/asf/systemml/repo
Commit: http://git-wip-us.apache.org/repos/asf/systemml/commit/e3f0cf40
Tree: http://git-wip-us.apache.org/repos/asf/systemml/tree/e3f0cf40
Diff: http://git-wip-us.apache.org/repos/asf/systemml/diff/e3f0cf40

Branch: refs/heads/gh-pages
Commit: e3f0cf4041d6591632a0a7a7c5b6aa279203aeca
Parents: 27e06a5
Author: Matthias Boehm <mboe...@gmail.com>
Authored: Thu Sep 21 00:55:04 2017 -0700
Committer: Matthias Boehm <mboe...@gmail.com>
Committed: Thu Sep 21 11:54:11 2017 -0700

----------------------------------------------------------------------
 dml-language-reference.md | 1 +
 1 file changed, 1 insertion(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/systemml/blob/e3f0cf40/dml-language-reference.md
----------------------------------------------------------------------
diff --git a/dml-language-reference.md b/dml-language-reference.md
index d8ca07f..c402acc 100644
--- a/dml-language-reference.md
+++ b/dml-language-reference.md
@@ -1642,6 +1642,7 @@ Function | Description | Parameters | Example
 transformencode() | Transforms a frame into a matrix using specification. 
<br/> Builds and applies frame metadata. | Input:<br/> target = &lt;frame&gt; 
<br/> spec = &lt;json specification&gt; <br/> Outputs: &lt;matrix&gt;, 
&lt;frame&gt;|[transformencode](dml-language-reference.html#transformencode)
 transformdecode() | Transforms a matrix into a frame using specification. 
<br/> Valid only for specific transformation types. | Input:<br/> target = 
&lt;matrix&gt; <br/> spec = &lt;json specification&gt; <br/> meta = 
&lt;frame&gt; <br/> Output: &lt;frame&gt; 
|[transformdecode](dml-language-reference.html#transformdecode)
 transformapply() | Transforms a frame into a matrix using specification. <br/> 
Applies existing frame metadata. |  Input:<br/> target = &lt;frame&gt; <br/> 
spec = &lt;json specification&gt; <br/> meta = &lt;frame&gt; <br/> Output: 
&lt;matrix&gt; | [transformapply](dml-language-reference.html#transformapply)
+transformcolmap() | Obtains the column mapping of a transformed frame using 
the given specification. The input frame is assumed to be the meta data frame 
returned from a transformencode call. <br/> The output has a row per encoded 
input attribute, indicating the source column position, as well as the start 
and end positions in the encode output. | Input:<br/> target = &lt;frame&gt; 
<br/> spec = &lt;json specification&gt; <br/> Output: &lt;matrix&gt; 
|[transformcolmap](dml-language-reference.html#transformdecode)
 
 The following table summarizes the supported transformations for 
<code>transformencode(), transformdecode(), transformapply()</code>.  Note only 
recoding, dummy coding and pass-through are reversible, i.e., subject to 
<code>transformdecode()</code>, whereas binning, missing value imputation, and 
omit are not.
 

Reply via email to