[
https://issues.apache.org/jira/browse/HAMA-990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15292377#comment-15292377
]
Behroz Sikander commented on HAMA-990:
--------------------------------------
>> I personally recommend you don't spend much time for other trivial bug fixes.
Okay. I am very close to understanding it completely but I will move my focus
towards the main goal as you mentioned.
Regarding the main goal, I think that we should check Hama on the following
types of algorithms.
1- Batch
2- Iterative
3- Graph
4- Query Processing
and the proposed algorithms are
1- Batch - Word Count
2- Iterative/ML - K-Means
3- Graph - Page Rank
4- Query Processing - We can use MRQL for this and can perform a scan/join on a
dataset.[2]
According to [1] and [3], Apache Flink is faster than Spark in K-Means, Page
Rank and Query Processing whereas Spark is faster in Word Count. We can
reproduce these results in our cluster and then can calculate the results for
Hama. Once we have all the results we can compare all the systems.
Further,
1- for monitoring the memory, CPU, harddrive and network usage we can use [4].
What do you think about this ?
2- Karamel can be used for easy installation of Spark and Flink [5]. I am also
okay with manual installation. Any suggestions ?
3- Spark and Flink also have a TeraSort benchmark where Flink is apparently
faster. [6]. Should we also do a TeraSort benchmark ?
4- Should we try all the systems Flink/Spark/Hama on default configurations or
we should tweak them for best performance for each algorithm ?
[1] - http://www.slideshare.net/sbaltagi/overview-of-apacheflinkbyslimbaltagi
- See slide 63
[2] - http://database.cs.brown.edu/sigmod09/benchmarks-sigmod09.pdf
[3] - http://link.springer.com/chapter/10.1007/978-3-319-19027-3_3
[4] - https://github.com/shelan/collectl-monitoring
[5] - http://karamel.readthedocs.io/en/latest/text/overview.html
[6] -
http://shelan.org/blog/2016/01/31/reproducible-experiment-to-compare-apache-spark-and-apache-flink-batch-processing/
> GSoC'16: Apache Hama benchmark against Spark and Flink
> ------------------------------------------------------
>
> Key: HAMA-990
> URL: https://issues.apache.org/jira/browse/HAMA-990
> Project: Hama
> Issue Type: Documentation
> Reporter: Behroz Sikander
> Priority: Minor
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)