[SYSTEMML-1927] New frame transformcolmap builtin function Our decision tree script requires dummy coded inputs of all categorical attributes as well as a mapping matrix of their column positions. This patch introduces a new frame builtin function that leverages the transformencode meta data in order to automatically compute the mapping matrix. Since this is a meta data operation, it is only implemented in CP (but it requires the meta data frame to fit into memory). Furthermore, this also includes a respective test case and the documentation update.
This changes enables a future simplification of decision tree and random forest, where we could do any pre-processing automatically inside the script instead of requiring the user to do it manually. Project: http://git-wip-us.apache.org/repos/asf/systemml/repo Commit: http://git-wip-us.apache.org/repos/asf/systemml/commit/e3f0cf40 Tree: http://git-wip-us.apache.org/repos/asf/systemml/tree/e3f0cf40 Diff: http://git-wip-us.apache.org/repos/asf/systemml/diff/e3f0cf40 Branch: refs/heads/gh-pages Commit: e3f0cf4041d6591632a0a7a7c5b6aa279203aeca Parents: 27e06a5 Author: Matthias Boehm <mboe...@gmail.com> Authored: Thu Sep 21 00:55:04 2017 -0700 Committer: Matthias Boehm <mboe...@gmail.com> Committed: Thu Sep 21 11:54:11 2017 -0700 ---------------------------------------------------------------------- dml-language-reference.md | 1 + 1 file changed, 1 insertion(+) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/systemml/blob/e3f0cf40/dml-language-reference.md ---------------------------------------------------------------------- diff --git a/dml-language-reference.md b/dml-language-reference.md index d8ca07f..c402acc 100644 --- a/dml-language-reference.md +++ b/dml-language-reference.md @@ -1642,6 +1642,7 @@ Function | Description | Parameters | Example transformencode() | Transforms a frame into a matrix using specification. <br/> Builds and applies frame metadata. | Input:<br/> target = <frame> <br/> spec = <json specification> <br/> Outputs: <matrix>, <frame>|[transformencode](dml-language-reference.html#transformencode) transformdecode() | Transforms a matrix into a frame using specification. <br/> Valid only for specific transformation types. | Input:<br/> target = <matrix> <br/> spec = <json specification> <br/> meta = <frame> <br/> Output: <frame> |[transformdecode](dml-language-reference.html#transformdecode) transformapply() | Transforms a frame into a matrix using specification. <br/> Applies existing frame metadata. | Input:<br/> target = <frame> <br/> spec = <json specification> <br/> meta = <frame> <br/> Output: <matrix> | [transformapply](dml-language-reference.html#transformapply) +transformcolmap() | Obtains the column mapping of a transformed frame using the given specification. The input frame is assumed to be the meta data frame returned from a transformencode call. <br/> The output has a row per encoded input attribute, indicating the source column position, as well as the start and end positions in the encode output. | Input:<br/> target = <frame> <br/> spec = <json specification> <br/> Output: <matrix> |[transformcolmap](dml-language-reference.html#transformdecode) The following table summarizes the supported transformations for <code>transformencode(), transformdecode(), transformapply()</code>. Note only recoding, dummy coding and pass-through are reversible, i.e., subject to <code>transformdecode()</code>, whereas binning, missing value imputation, and omit are not.