[ 
https://issues.apache.org/jira/browse/GIRAPH-683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13682420#comment-13682420
 ] 

Eli Reisman commented on GIRAPH-683:
------------------------------------

This is fantastic work, excellent!
                
> Jython for Computation
> ----------------------
>
>                 Key: GIRAPH-683
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-683
>             Project: Giraph
>          Issue Type: Bug
>            Reporter: Nitay Joffe
>            Assignee: Nitay Joffe
>
> Support for writing Computation code in Python. We add Jython bindings so 
> that the Python computation code can communicate back with the Java Giraph 
> classes.
> To make this work I had to change a few parts of Giraph:
> 1) The Jython computation is not known until we read the script and create a 
> Computation object for it at runtime. This has to be done on each worker 
> separately after the job has launched. Because of this, there is no 
> Computation class set at the beginning. I suspect other scripting languages 
> will have similar issue. To fix this I created a ComputationFactory interface 
> which is responsible for creating the Computation, with a default that just 
> grabs the class from the Configuration and creates it.
> 2) I created a GiraphTypes class to hold the I,V,E,M1,M2 classes. There was a 
> lot of repetitive code around these things so centralizing it all in one 
> place made things a lot cleaner.
> 3) I added some more helpers like isDefaultValue() to our conf options.
> To use Jython all the user has to do is call Jython#init(...) somewhere in 
> his initialization.
> This patch contains our page rank benchmark implementation in Jython. I added 
> an option (--jython) which chooses whether to run the default or the jython 
> version.
> Here is the initial PageRankBenchmark comparison (4 workers, 10M vertices, 25 
> edges per vertex):
> Java:
> Total (milliseconds)  104,388 0       104,388
> Superstep 3 (milliseconds)    16,750  0       16,750
> Setup (milliseconds)  2,895   0       2,895
> Shutdown (milliseconds)       50      0       50
> Superstep 0 (milliseconds)    15,838  0       15,838
> Superstep 4 (milliseconds)    19,088  0       19,088
> Input superstep (milliseconds)        8,700   0       8,700
> Superstep 5 (milliseconds)    3,550   0       3,550
> Superstep 2 (milliseconds)    17,905  0       17,905
> Superstep 1 (milliseconds)    19,608  0       19,608
> Jython:
> Total (milliseconds)  244,965 0       244,965
> Superstep 3 (milliseconds)    43,405  0       43,405
> Setup (milliseconds)  3,735   0       3,735
> Shutdown (milliseconds)       117     0       117
> Superstep 0 (milliseconds)    36,962  0       36,962
> Superstep 4 (milliseconds)    46,088  0       46,088
> Input superstep (milliseconds)        8,551   0       8,551
> Superstep 5 (milliseconds)    22,040  0       22,040
> Superstep 2 (milliseconds)    42,329  0       42,329
> Superstep 1 (milliseconds)    41,737  0       41,737
> Overhead of Jython vs Java = 2.5x.
> However at scale things get better (200 workers, 1B vertices, 200 edges per 
> vertex):
> Java:
> Total (milliseconds)  1,702,429       0       1,702,429
> Superstep 3 (milliseconds)    316,844 0       316,844
> Setup (milliseconds)  13,226  0       13,226
> Shutdown (milliseconds)       113     0       113
> Superstep 0 (milliseconds)    300,950 0       300,950
> Superstep 4 (milliseconds)    318,627 0       318,627
> Input superstep (milliseconds)        114,673 0       114,673
> Superstep 5 (milliseconds)    7,898   0       7,898
> Superstep 2 (milliseconds)    312,152 0       312,152
> Superstep 1 (milliseconds)    317,942 0       317,942
> Jython:
> Total (milliseconds)  2,123,228       0       2,123,228
> Superstep 3 (milliseconds)    406,422 0       406,422
> Setup (milliseconds)  7,159   0       7,159
> Shutdown (milliseconds)       131     0       131
> Superstep 0 (milliseconds)    347,732 0       347,732
> Superstep 4 (milliseconds)    405,696 0       405,696
> Input superstep (milliseconds)        112,645 0       112,645
> Superstep 5 (milliseconds)    46,687  0       46,687
> Superstep 2 (milliseconds)    410,349 0       410,349
> Superstep 1 (milliseconds)    386,404 0       386,404
> That's a mere 25% overhead.
> Take a look at the reviewboard for latest patch: 
> https://reviews.apache.org/r/11709/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to