[
https://issues.apache.org/jira/browse/GIRAPH-683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13682420#comment-13682420
]
Eli Reisman commented on GIRAPH-683:
------------------------------------
This is fantastic work, excellent!
> Jython for Computation
> ----------------------
>
> Key: GIRAPH-683
> URL: https://issues.apache.org/jira/browse/GIRAPH-683
> Project: Giraph
> Issue Type: Bug
> Reporter: Nitay Joffe
> Assignee: Nitay Joffe
>
> Support for writing Computation code in Python. We add Jython bindings so
> that the Python computation code can communicate back with the Java Giraph
> classes.
> To make this work I had to change a few parts of Giraph:
> 1) The Jython computation is not known until we read the script and create a
> Computation object for it at runtime. This has to be done on each worker
> separately after the job has launched. Because of this, there is no
> Computation class set at the beginning. I suspect other scripting languages
> will have similar issue. To fix this I created a ComputationFactory interface
> which is responsible for creating the Computation, with a default that just
> grabs the class from the Configuration and creates it.
> 2) I created a GiraphTypes class to hold the I,V,E,M1,M2 classes. There was a
> lot of repetitive code around these things so centralizing it all in one
> place made things a lot cleaner.
> 3) I added some more helpers like isDefaultValue() to our conf options.
> To use Jython all the user has to do is call Jython#init(...) somewhere in
> his initialization.
> This patch contains our page rank benchmark implementation in Jython. I added
> an option (--jython) which chooses whether to run the default or the jython
> version.
> Here is the initial PageRankBenchmark comparison (4 workers, 10M vertices, 25
> edges per vertex):
> Java:
> Total (milliseconds) 104,388 0 104,388
> Superstep 3 (milliseconds) 16,750 0 16,750
> Setup (milliseconds) 2,895 0 2,895
> Shutdown (milliseconds) 50 0 50
> Superstep 0 (milliseconds) 15,838 0 15,838
> Superstep 4 (milliseconds) 19,088 0 19,088
> Input superstep (milliseconds) 8,700 0 8,700
> Superstep 5 (milliseconds) 3,550 0 3,550
> Superstep 2 (milliseconds) 17,905 0 17,905
> Superstep 1 (milliseconds) 19,608 0 19,608
> Jython:
> Total (milliseconds) 244,965 0 244,965
> Superstep 3 (milliseconds) 43,405 0 43,405
> Setup (milliseconds) 3,735 0 3,735
> Shutdown (milliseconds) 117 0 117
> Superstep 0 (milliseconds) 36,962 0 36,962
> Superstep 4 (milliseconds) 46,088 0 46,088
> Input superstep (milliseconds) 8,551 0 8,551
> Superstep 5 (milliseconds) 22,040 0 22,040
> Superstep 2 (milliseconds) 42,329 0 42,329
> Superstep 1 (milliseconds) 41,737 0 41,737
> Overhead of Jython vs Java = 2.5x.
> However at scale things get better (200 workers, 1B vertices, 200 edges per
> vertex):
> Java:
> Total (milliseconds) 1,702,429 0 1,702,429
> Superstep 3 (milliseconds) 316,844 0 316,844
> Setup (milliseconds) 13,226 0 13,226
> Shutdown (milliseconds) 113 0 113
> Superstep 0 (milliseconds) 300,950 0 300,950
> Superstep 4 (milliseconds) 318,627 0 318,627
> Input superstep (milliseconds) 114,673 0 114,673
> Superstep 5 (milliseconds) 7,898 0 7,898
> Superstep 2 (milliseconds) 312,152 0 312,152
> Superstep 1 (milliseconds) 317,942 0 317,942
> Jython:
> Total (milliseconds) 2,123,228 0 2,123,228
> Superstep 3 (milliseconds) 406,422 0 406,422
> Setup (milliseconds) 7,159 0 7,159
> Shutdown (milliseconds) 131 0 131
> Superstep 0 (milliseconds) 347,732 0 347,732
> Superstep 4 (milliseconds) 405,696 0 405,696
> Input superstep (milliseconds) 112,645 0 112,645
> Superstep 5 (milliseconds) 46,687 0 46,687
> Superstep 2 (milliseconds) 410,349 0 410,349
> Superstep 1 (milliseconds) 386,404 0 386,404
> That's a mere 25% overhead.
> Take a look at the reviewboard for latest patch:
> https://reviews.apache.org/r/11709/
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira