Updated site contents Project: http://git-wip-us.apache.org/repos/asf/incubator-hivemall/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-hivemall/commit/52bc44fa Tree: http://git-wip-us.apache.org/repos/asf/incubator-hivemall/tree/52bc44fa Diff: http://git-wip-us.apache.org/repos/asf/incubator-hivemall/diff/52bc44fa
Branch: refs/heads/master Commit: 52bc44fa258b1595bf2e8e9e9e71eace783a6e5b Parents: 32a657d Author: myui <[email protected]> Authored: Mon Nov 14 19:47:47 2016 +0900 Committer: myui <[email protected]> Committed: Mon Nov 14 19:47:47 2016 +0900 ---------------------------------------------------------------------- bin/build_site.sh | 26 ++++++++++++++++--- docs/gitbook/FOOTER.md | 3 +++ docs/gitbook/README.md | 10 ++----- docs/gitbook/book.json | 3 ++- .../regression/kddcup12tr2_lr_amplify.md | 12 ++++----- docs/gitbook/resources/images/amplify.png | Bin 0 -> 94408 bytes .../resources/images/amplify_elapsed.png | Bin 0 -> 19556 bytes docs/gitbook/resources/images/emr-bootstrap.png | Bin 0 -> 31830 bytes docs/gitbook/resources/images/emr-wizard.png | Bin 0 -> 35576 bytes docs/gitbook/resources/images/randamplify.png | Bin 0 -> 49567 bytes .../resources/images/randamplify_elapsed.png | Bin 0 -> 19047 bytes docs/gitbook/tips/emr.md | 6 +++-- docs/gitbook/tips/rand_amplify.md | 11 ++++---- 13 files changed, 45 insertions(+), 26 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/52bc44fa/bin/build_site.sh ---------------------------------------------------------------------- diff --git a/bin/build_site.sh b/bin/build_site.sh index 04496b4..de3aa08 100755 --- a/bin/build_site.sh +++ b/bin/build_site.sh @@ -30,16 +30,36 @@ if [ "$HIVEMALL_HOME" == "" ]; then fi cd $HIVEMALL_HOME +HIVEMALL_HOME=`pwd` + +## +# Run maven-site +## + mvn clean site +## +# building gitbook userguide +## + +if ! [ -x "$(command -v gitbook)" ]; then + echo "gitbook is not installed .." >&2 + echo "Run 'npm install gitbook-cli -g' to install gitbook" >&2 + exit 1 +fi + +cd ${HIVEMALL_HOME}/docs/gitbook +gitbook install && gitbook build +cd $HIVEMALL_HOME + cp -R docs/gitbook/_book target/site/userguide -# +## # Run HTTP server on localhost -# +## # ruby -cd $HIVEMALL_HOME/target/site +cd ${HIVEMALL_HOME}/target/site ruby -rwebrick -e 'WEBrick::HTTPServer.new(:DocumentRoot => "./", :Port => 8000).start' # python3 http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/52bc44fa/docs/gitbook/FOOTER.md ---------------------------------------------------------------------- diff --git a/docs/gitbook/FOOTER.md b/docs/gitbook/FOOTER.md new file mode 100644 index 0000000..588afbb --- /dev/null +++ b/docs/gitbook/FOOTER.md @@ -0,0 +1,3 @@ +<sub><font color="gray"> +Apache Hivemall is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by the Apache Incubator. +</font></sub> \ No newline at end of file http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/52bc44fa/docs/gitbook/README.md ---------------------------------------------------------------------- diff --git a/docs/gitbook/README.md b/docs/gitbook/README.md index 7b61570..164ef74 100644 --- a/docs/gitbook/README.md +++ b/docs/gitbook/README.md @@ -23,7 +23,7 @@ Apache Hivemall is a collection of machine learning algorithms and versatile data analytics functions. It provides a number of ease of use machine learning functionalities through the <a href="https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF">Apache Hive UDF/UDAF/UDTF interface</a>. </div> -<div style="text-align:center"><img src="resources/images/hivemall-logo-color-small.png"/></div> +<div style="text-align:center"><img src="./resources/images/hivemall-logo-color-small.png"/></div> Apache Hivemall offers a variety of functionalities: <strong>regression, classification, recommendation, anomaly detection, k-nearest neighbor, and feature engineering</strong>. It also supports state-of-the-art machine learning algorithms such as Soft Confidence Weighted, Adaptive Regularization of Weight Vectors, Factorization Machines, and AdaDelta. @@ -32,10 +32,4 @@ Apache Hivemall offers a variety of functionalities: <strong>regression, classif Apache Hivemall is mainly designed to run on [Apache Hive](https://hive.apache.org/) but it also supports [Apache Pig](https://pig.apache.org/) and [Apache Spark](http://spark.apache.org/) for the runtime. Thus, it can be considered as a cross platform library for machine learning; prediction models built by a batch query of Apache Hive can be used on Apache Spark/Pig, and conversely, prediction models build by Apache Spark can be used from Apache Hive/Pig. -<div style="text-align:center"><img src="resources/images/techstack.png" width="80%" height="80%"/></div> - ---- - -<font color="gray"> -<sub>Apache Hivemall is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by the <a href="http://incubator.apache.org/">Apache Incubator</a>.</sub> -</font> \ No newline at end of file +<div style="text-align:center"><img src="./resources/images/techstack.png" width="80%" height="80%"/></div> http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/52bc44fa/docs/gitbook/book.json ---------------------------------------------------------------------- diff --git a/docs/gitbook/book.json b/docs/gitbook/book.json index 2f70ed9..b622a7b 100644 --- a/docs/gitbook/book.json +++ b/docs/gitbook/book.json @@ -17,7 +17,8 @@ "multipart", "codeblock-filename", "katex", - "emphasize" + "emphasize", + "localized-footer" ], "pluginsConfig": { "theme-default": { http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/52bc44fa/docs/gitbook/regression/kddcup12tr2_lr_amplify.md ---------------------------------------------------------------------- diff --git a/docs/gitbook/regression/kddcup12tr2_lr_amplify.md b/docs/gitbook/regression/kddcup12tr2_lr_amplify.md index 55b8caf..e402ce4 100644 --- a/docs/gitbook/regression/kddcup12tr2_lr_amplify.md +++ b/docs/gitbook/regression/kddcup12tr2_lr_amplify.md @@ -70,8 +70,8 @@ group by feature; ``` The above query is executed by 2 MapReduce jobs as shown below: - -[Here](https://dl.dropboxusercontent.com/u/13123103/hivemall/amplify_plan.txt) is the actual plan generated by the Hive. + +<img src="../resources/images/amplify.png" alt="amplifier"/> Using *trainning_x3* instead of the plain training table results in higher and better AUC (0.746214) in [this](https://github.com/myui/hivemall/wiki/KDDCup-2012-track-2-CTR-prediction-(regression\)) example. @@ -80,7 +80,7 @@ When the training table is so large that involves 100 Map tasks, the merge opera Note that the actual bottleneck is not M/R iterations but shuffling training instance. Iteration without shuffling (as in [the Spark example](http://spark.incubator.apache.org/examples.html)) causes very slow convergence and results in requiring more iterations. Shuffling cannot be avoided even in iterative MapReduce variants. - +<img src="../resources/images/amplify_elapsed.png" alt="amplify elapsed"/> --- # Amplify and shuffle training examples in each Map task @@ -101,12 +101,12 @@ from ``` The training query is executed as follows: - -[Here](https://dl.dropboxusercontent.com/u/13123103/hivemall/randamplify_plan.txt) is the actual query plan. + +<img src="../resources/images/randamplify.png" alt="Random amplify"/> The map-local multiplication and shuffling has no bottleneck in the merge phase and the query is efficiently executed within a single MapReduce job. - +<img src="../resources/images/randamplify_elapsed.png" alt="rand_amplify elapsed"/> Using *rand_amplify* results in a better AUC (0.743392) in [this](https://github.com/myui/hivemall/wiki/KDDCup-2012-track-2-CTR-prediction-(regression\)) example. http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/52bc44fa/docs/gitbook/resources/images/amplify.png ---------------------------------------------------------------------- diff --git a/docs/gitbook/resources/images/amplify.png b/docs/gitbook/resources/images/amplify.png new file mode 100644 index 0000000..f537e98 Binary files /dev/null and b/docs/gitbook/resources/images/amplify.png differ http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/52bc44fa/docs/gitbook/resources/images/amplify_elapsed.png ---------------------------------------------------------------------- diff --git a/docs/gitbook/resources/images/amplify_elapsed.png b/docs/gitbook/resources/images/amplify_elapsed.png new file mode 100644 index 0000000..595dd60 Binary files /dev/null and b/docs/gitbook/resources/images/amplify_elapsed.png differ http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/52bc44fa/docs/gitbook/resources/images/emr-bootstrap.png ---------------------------------------------------------------------- diff --git a/docs/gitbook/resources/images/emr-bootstrap.png b/docs/gitbook/resources/images/emr-bootstrap.png new file mode 100644 index 0000000..fea2ee2 Binary files /dev/null and b/docs/gitbook/resources/images/emr-bootstrap.png differ http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/52bc44fa/docs/gitbook/resources/images/emr-wizard.png ---------------------------------------------------------------------- diff --git a/docs/gitbook/resources/images/emr-wizard.png b/docs/gitbook/resources/images/emr-wizard.png new file mode 100644 index 0000000..725cc9e Binary files /dev/null and b/docs/gitbook/resources/images/emr-wizard.png differ http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/52bc44fa/docs/gitbook/resources/images/randamplify.png ---------------------------------------------------------------------- diff --git a/docs/gitbook/resources/images/randamplify.png b/docs/gitbook/resources/images/randamplify.png new file mode 100644 index 0000000..432f775 Binary files /dev/null and b/docs/gitbook/resources/images/randamplify.png differ http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/52bc44fa/docs/gitbook/resources/images/randamplify_elapsed.png ---------------------------------------------------------------------- diff --git a/docs/gitbook/resources/images/randamplify_elapsed.png b/docs/gitbook/resources/images/randamplify_elapsed.png new file mode 100644 index 0000000..7d5be32 Binary files /dev/null and b/docs/gitbook/resources/images/randamplify_elapsed.png differ http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/52bc44fa/docs/gitbook/tips/emr.md ---------------------------------------------------------------------- diff --git a/docs/gitbook/tips/emr.md b/docs/gitbook/tips/emr.md index 030a594..61cb25b 100644 --- a/docs/gitbook/tips/emr.md +++ b/docs/gitbook/tips/emr.md @@ -44,8 +44,10 @@ I'm usually lunching EMR instances with cheap Spot instances through [CLI client _To use YARN instead of old Hadoop, specify "[--ami-version 3.0.0](http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-plan-ami.html#ami-versions-supported)". Hivemall works on both old Hadoop and YARN._ Or, lunch an interactive EMR job using the EMR GUI wizard. - - + +<img src="../resources/images/emr-wizard.png" alt="emr-wizard"/> + +<img src="../resources/images/emr-bootstrap.png" alt="emr-bootstrap"/> ## Data preparation http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/52bc44fa/docs/gitbook/tips/rand_amplify.md ---------------------------------------------------------------------- diff --git a/docs/gitbook/tips/rand_amplify.md b/docs/gitbook/tips/rand_amplify.md index 55b8caf..cd546ec 100644 --- a/docs/gitbook/tips/rand_amplify.md +++ b/docs/gitbook/tips/rand_amplify.md @@ -70,8 +70,7 @@ group by feature; ``` The above query is executed by 2 MapReduce jobs as shown below: - -[Here](https://dl.dropboxusercontent.com/u/13123103/hivemall/amplify_plan.txt) is the actual plan generated by the Hive. +<img src="../resources/images/amplify.png" alt="amplifier"/> Using *trainning_x3* instead of the plain training table results in higher and better AUC (0.746214) in [this](https://github.com/myui/hivemall/wiki/KDDCup-2012-track-2-CTR-prediction-(regression\)) example. @@ -80,7 +79,7 @@ When the training table is so large that involves 100 Map tasks, the merge opera Note that the actual bottleneck is not M/R iterations but shuffling training instance. Iteration without shuffling (as in [the Spark example](http://spark.incubator.apache.org/examples.html)) causes very slow convergence and results in requiring more iterations. Shuffling cannot be avoided even in iterative MapReduce variants. - +<img src="../resources/images/amplify_elapsed.png" alt="amplify_elapsed"/> --- # Amplify and shuffle training examples in each Map task @@ -101,12 +100,12 @@ from ``` The training query is executed as follows: - -[Here](https://dl.dropboxusercontent.com/u/13123103/hivemall/randamplify_plan.txt) is the actual query plan. + +<img src="../resources/images/randamplify.png" alt="randamplify"/> The map-local multiplication and shuffling has no bottleneck in the merge phase and the query is efficiently executed within a single MapReduce job. - +<img src="../resources/images/randamplify_elapsed.png" alt="randamplify_elapsed"/> Using *rand_amplify* results in a better AUC (0.743392) in [this](https://github.com/myui/hivemall/wiki/KDDCup-2012-track-2-CTR-prediction-(regression\)) example.
