[ 
https://issues.apache.org/jira/browse/MADLIB-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16147605#comment-16147605
 ] 

Jingyi Mei edited comment on MADLIB-1124 at 8/30/17 4:59 PM:
-------------------------------------------------------------

To answer questions from [~fmcquillan]:
1. Threshold. Yes, it should be optional. Because we do normalization for both 
authority and hub, the valid threshold range is \[0.1\]. Currently, there is no 
strong evidence that we should use different thresholds for authority and hub 
score, and we decide to pick (1/number of vertices * 1000) as default 
threshold. Here is the new description for threshold:
{code}
Threshold (optional): FLOAT8, default: (1/number of vertices * 1000). If the 
difference between the values of both scores (Authority and Hub) for every 
vertex of two consecutive iterations is smaller than 'threshold', or the 
iteration number is larger than 'max_iter', the computation stops. If you set 
the threshold to zero, then you will force the algorithm to run for the full 
number of iterations specified in 'max_iter'. Threshold need to be set to a 
value equal or less than 1 since both values (Authority and Hub) of nodes are 
initialized as 1. Note that both Authority and Hub value difference must be 
below threshold for the algorithm to stop. 
{code}
2. HITS doesn’t assign different ‘weight' or ‘importance' to different nodes, 
so it shouldn’t rely on eigenvector centrality.


was (Author: jingyimei):
To answer questions from [~fmcquillan]:
1. Threshold. Yes, it should be optional. Because we do normalization for both 
authority and hub, the valid threshold range is \[0.1\]. Currently, there is no 
strong evidence that we should use different thresholds for authority and hub 
score, and we decide to pick (1/number of vertices * 1000) as default 
threshold. Here is the new description for threshold:
{code}
FLOAT8, default: (1/number of vertices * 1000). If the difference between the 
values of both scores (Authority and Hub) for every vertex of two consecutive 
iterations is smaller than 'threshold', or the iteration number is larger than 
'max_iter', the computation stops. If you set the threshold to zero, then you 
will force the algorithm to run for the full number of iterations specified in 
'max_iter'. Threshold need to be set to a value equal or less than 1 since both 
values (Authority and Hub) of nodes are initialized as 1. Note that both 
Authority and Hub value difference must be below threshold for the algorithm to 
stop. 
{code}
2. HITS doesn’t assign different ‘weight' or ‘importance' to different nodes, 
so it shouldn’t rely on eigenvector centrality.

> Graph - HITS algorithm
> ----------------------
>
>                 Key: MADLIB-1124
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1124
>             Project: Apache MADlib
>          Issue Type: New Feature
>          Components: Module: Graph
>            Reporter: Frank McQuillan
>            Assignee: Jingyi Mei
>             Fix For: v2.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to