[
https://issues.apache.org/jira/browse/FLINK-3879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15346899#comment-15346899
]
Greg Hogan commented on FLINK-3879:
-----------------------------------
[~vkalavri] here are the timings I get on an AWS c4.2xlarge using FLINK-3879
rebased to master and merged with pr1517 (hash-based combine). Each execution
is performing 10 iterations.
FLINK-3879 (HITS):
$ for i in `seq 10 2 20` ; do echo ; echo $i ; ./bin/flink run -q -class
org.apache.flink.graph.examples.HITS
~/flink-gelly-examples_2.10-1.1-SNAPSHOT.jar --input rmat --scale $i --output
hash --algorithm HITS ; done
Scale 10:
ChecksumHashCode 0x0000018d7c49ca4a, count 902
Execution runtime: 1,034 ms
Scale 12:
ChecksumHashCode 0x0000058a6efc82d1, count 3349
Execution runtime: 1,115 ms
Scale 14:
ChecksumHashCode 0x000015030ecf3188, count 12472
Execution runtime: 1,974 ms
Scale 16:
ChecksumHashCode 0x00004f23492849eb, count 46826
Execution runtime: 5,843 ms
Scale 18:
ChecksumHashCode 0x0001267c806b8338, count 174010
Execution runtime: 21,927 ms
Scale 20:
ChecksumHashCode 0x000449bd0da45343, count 646203
Execution runtime: 93,488 ms
FLINK-2044 (HITSAlgorithm):
$ for i in `seq 10 2 20` ; do echo ; echo $i ; ./bin/flink run -q -class
org.apache.flink.graph.examples.HITS
~/flink-gelly-examples_2.10-1.1-SNAPSHOT.jar --input rmat --scale $i --output
hash --algorithm HITSAlgorithm ; done
Scale 10:
ChecksumHashCode 0x000001c88b0818f0, count 902
Execution runtime: 761 ms
Scale 12:
Cluster retrieved
ChecksumHashCode 0x0000069bcad3b322, count 3349
Execution runtime: 1,094 ms
Scale 14:
ChecksumHashCode 0x0000186c50950fea, count 12472
Execution runtime: 2,290 ms
Scale 16:
ChecksumHashCode 0x00005b741faf30eb, count 46826
Execution runtime: 6,898 ms
Scale 18:
ChecksumHashCode 0x000153e520a8306c, count 174010
Execution runtime: 28,015 ms
Scale 20:
ChecksumHashCode 0x0004ed44e75c493a, count 646203
Execution runtime: 120,736 ms
> Native implementation of HITS algorithm
> ---------------------------------------
>
> Key: FLINK-3879
> URL: https://issues.apache.org/jira/browse/FLINK-3879
> Project: Flink
> Issue Type: New Feature
> Components: Gelly
> Affects Versions: 1.1.0
> Reporter: Greg Hogan
> Assignee: Greg Hogan
> Fix For: 1.1.0
>
>
> Hyperlink-Induced Topic Search (HITS, also "hubs and authorities") is
> presented in [0] and described in [1].
> "[HITS] is a very popular and effective algorithm to rank documents based on
> the link information among a set of documents. The algorithm presumes that a
> good hub is a document that points to many others, and a good authority is a
> document that many documents point to."
> [https://pdfs.semanticscholar.org/a8d7/c7a4c53a9102c4239356f9072ec62ca5e62f.pdf]
> This implementation differs from FLINK-2044 by providing for convergence,
> outputting both hub and authority scores, and completing in half the number
> of iterations.
> [0] http://www.cs.cornell.edu/home/kleinber/auth.pdf
> [1] https://en.wikipedia.org/wiki/HITS_algorithm
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)