[2/2] incubator-hivemall git commit: Updated gitbook for Spark top-k join

myui Wed, 01 Feb 2017 18:48:41 -0800

Updated gitbook for Spark top-k join


Project: http://git-wip-us.apache.org/repos/asf/incubator-hivemall/repo
Commit: 
http://git-wip-us.apache.org/repos/asf/incubator-hivemall/commit/4909deda
Tree: http://git-wip-us.apache.org/repos/asf/incubator-hivemall/tree/4909deda
Diff: http://git-wip-us.apache.org/repos/asf/incubator-hivemall/diff/4909deda

Branch: refs/heads/master
Commit: 4909deda546946a66de31195b9a3eaa120382c50
Parents: b2032af
Author: myui <[email protected]>
Authored: Thu Feb 2 11:48:00 2017 +0900
Committer: myui <[email protected]>
Committed: Thu Feb 2 11:48:00 2017 +0900

----------------------------------------------------------------------
 docs/gitbook/SUMMARY.md              |  5 +++++
 docs/gitbook/spark/misc/misc.md      |  0
 docs/gitbook/spark/misc/topk_join.md | 15 ++++++---------
 3 files changed, 11 insertions(+), 9 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/4909deda/docs/gitbook/SUMMARY.md
----------------------------------------------------------------------
diff --git a/docs/gitbook/SUMMARY.md b/docs/gitbook/SUMMARY.md
index 33bb46c..76f7924 100644
--- a/docs/gitbook/SUMMARY.md
+++ b/docs/gitbook/SUMMARY.md
@@ -145,6 +145,11 @@
 
 * [Outlier Detection using Local Outlier Factor (LOF)](anomaly/lof.md)
 
+## Part X - Hivemall on Spark
+
+* [Generic features](spark/misc/misc.md)
+    * [Top-k Join processing](spark/misc/topk_join.md)
+
 ## Part X - External References
 
 * [Hivemall on Apache Spark](https://github.com/maropu/hivemall-spark)

http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/4909deda/docs/gitbook/spark/misc/misc.md
----------------------------------------------------------------------
diff --git a/docs/gitbook/spark/misc/misc.md b/docs/gitbook/spark/misc/misc.md
new file mode 100644
index 0000000..e69de29

http://git-wip-us.apache.org/repos/asf/incubator-hivemall/blob/4909deda/docs/gitbook/spark/misc/topk_join.md
----------------------------------------------------------------------
diff --git a/docs/gitbook/spark/misc/topk_join.md 
b/docs/gitbook/spark/misc/topk_join.md
index 03e0a23..af3351d 100644
--- a/docs/gitbook/spark/misc/topk_join.md
+++ b/docs/gitbook/spark/misc/topk_join.md
@@ -21,13 +21,10 @@
 
 `top_k_join` is much IO-efficient as compared to regular joining + ranking 
operations because `top_k_join` drops unsatisfied records and writes only top-k 
records to disks during joins.
 
-<!-- toc -->
-
-# Notice
-
-* `top_k_join` is supported in the DataFrame of Spark v2.1.0 or later.
-* A type of `score` must be ByteType, ShortType, IntegerType, LongType, 
FloatType, DoubleType, or DecimalType.
-* If `k` is less than 0, the order is reverse and `top_k_join` joins the 
tail-K records of `rightDf`.
+> #### Caution
+> * `top_k_join` is supported in the DataFrame of Spark v2.1.0 or later.
+> * A type of `score` must be ByteType, ShortType, IntegerType, LongType, 
FloatType, DoubleType, or DecimalType.
+> * If `k` is less than 0, the order is reverse and `top_k_join` joins the 
tail-K records of `rightDf`.
 
 # Usage
 
@@ -61,7 +58,7 @@ For example, we have two tables below;
 In the two tables, the example computes the nearest `position` for `userId` in 
each `group`.
 The standard way using DataFrame window functions would be as follows:
 
-```
+```scala
 val computeDistanceFunc =
   sqrt(pow(inputDf("x") - masterDf("x"), lit(2.0)) + pow(inputDf("y") - 
masterDf("y"), lit(2.0)))
 
@@ -76,7 +73,7 @@ leftDf.join(
 
 You can use `top_k_join` as follows:
 
-```
+```scala
 leftDf.top_k_join(
     k = lit(-1),
     right = rightDf,

[2/2] incubator-hivemall git commit: Updated gitbook for Spark top-k join

Reply via email to