Repository: madlib Updated Branches: refs/heads/master c2a874db7 -> daf67f81b
Docs: Update docs for HITS and linear regr Project: http://git-wip-us.apache.org/repos/asf/madlib/repo Commit: http://git-wip-us.apache.org/repos/asf/madlib/commit/daf67f81 Tree: http://git-wip-us.apache.org/repos/asf/madlib/tree/daf67f81 Diff: http://git-wip-us.apache.org/repos/asf/madlib/diff/daf67f81 Branch: refs/heads/master Commit: daf67f81b608396d8e3c04a9bf9890449a0a5b3c Parents: c2a874d Author: Frank McQuillan <[email protected]> Authored: Wed Nov 22 16:00:37 2017 -0800 Committer: Rahul Iyer <[email protected]> Committed: Wed Nov 22 16:00:37 2017 -0800 ---------------------------------------------------------------------- src/ports/postgres/modules/graph/hits.sql_in | 29 ++++++++++---------- .../postgres/modules/regress/linear.sql_in | 11 ++++---- 2 files changed, 20 insertions(+), 20 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/madlib/blob/daf67f81/src/ports/postgres/modules/graph/hits.sql_in ---------------------------------------------------------------------- diff --git a/src/ports/postgres/modules/graph/hits.sql_in b/src/ports/postgres/modules/graph/hits.sql_in index d06bbb8..bf4b414 100644 --- a/src/ports/postgres/modules/graph/hits.sql_in +++ b/src/ports/postgres/modules/graph/hits.sql_in @@ -46,7 +46,7 @@ graph. Given a graph, the HITS (Hyperlink-Induced Topic Search) algorithm outputs the authority score and hub score of every vertex, where authority estimates the value of the content of the page and hub estimates the value of its links to -other pages. This algorithm was developed by Jon Kleinberg to rate web pages. +other pages. This algorithm was originally developed to rate web pages [1]. @anchor hits @par HITS @@ -91,10 +91,9 @@ this string argument: a row for every vertex from 'vertex_table' with the following columns: - vertex_id : The id of a vertex. Will use the input parameter 'vertex_id' for column naming. - - authority : The vertex's authority score. - - hub : The vertex's hub score. - - grouping_cols : Grouping column (if any) values associated with the vertex_id. - + - authority : The vertex authority score. + - hub : The vertex hub score. + - grouping_cols : Grouping column values (if any) associated with the vertex_id. </dd> A summary table is also created that contains information @@ -103,18 +102,19 @@ It is named by adding the suffix '_summary' to the 'out_table' parameter. <dt>max_iter (optional) </dt> -<dd>INTEGER, default: 100. The maximum number of iterations allowed. An +<dd>INTEGER, default: 100. The maximum number of iterations allowed. Each iteration consists of both authority and hub phases.</dd> <dt>threshold (optional) </dt> <dd>FLOAT8, default: (1/number of vertices * 1000). + Threshold must be set to a value between 0 and 1, inclusive + of end points. If the difference between two consecutive iterations of authority AND two consecutive iterations of hub is smaller than 'threshold', then the - computation stops. If you set the threshold to zero, then you will force the + computation stops. That is, both authority and hub value differences + must be below the specified threshold for the algorithm to stop. + If you set the threshold to 0, then you will force the algorithm to run for the full number of iterations specified in 'max_iter'. - Threshold needs to be set to a value between 0 and 1. Note that both - authority and hub value difference must be below threshold for the - algorithm to stop. </dd> <dt>grouping_cols (optional)</dt> @@ -130,9 +130,8 @@ parameter. @anchor notes @par Notes -1. The HITS algorithm is based on Kleinberg's paper [1]. -2. This algorithm supports multigraph and each duplicated edge is considered - for counting when calculating authority and hub scores. +This algorithm supports multigraph and each duplicated edge is considered +for counting when calculating authority and hub scores. @anchor examples @examp @@ -370,7 +369,9 @@ SELECT * FROM hits_out_summary order by user_id; @anchor literature @par Literature -[1] HITS algorithm https://www.cs.cornell.edu/home/kleinber/auth.pdf +[1] Kleinerg, Jon M., "Authoritative Sources in a Hyperlinked +Environment", Journal of the ACM, Sept. 1999. +https://www.cs.cornell.edu/home/kleinber/auth.pdf */ CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.hits( http://git-wip-us.apache.org/repos/asf/madlib/blob/daf67f81/src/ports/postgres/modules/regress/linear.sql_in ---------------------------------------------------------------------- diff --git a/src/ports/postgres/modules/regress/linear.sql_in b/src/ports/postgres/modules/regress/linear.sql_in index 6572652..e1484db 100644 --- a/src/ports/postgres/modules/regress/linear.sql_in +++ b/src/ports/postgres/modules/regress/linear.sql_in @@ -189,7 +189,6 @@ linregr_predict(coef, col_ind) <dl class="arglist"> <dt>coef</dt> <dd>FLOAT8[]. Vector of the coefficients of regression from training.</dd> - <dt>col_ind</dt> <dd>FLOAT8[]. An array containing the independent variable column names, as was used for the training. </dd> @@ -326,14 +325,14 @@ variance_covariance | {{1226330302.62852,-300921.595596804,551696673.397849 <pre class="example"> \\x OFF SELECT houses.*, - madlib.linregr_predict( ARRAY[1,tax,bath,size], - m.coef + madlib.linregr_predict( m.coef, + ARRAY[1,tax,bath,size] ) as predict, price - - madlib.linregr_predict( ARRAY[1,tax,bath,size], - m.coef + madlib.linregr_predict( m.coef, + ARRAY[1,tax,bath,size] ) as residual -FROM houses, houses_linregr m; +FROM houses, houses_linregr m ORDER BY id; </pre> Result: <pre class="result">
