[ 
https://issues.apache.org/jira/browse/GIRAPH-683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nitay Joffe updated GIRAPH-683:
-------------------------------

    Description: 
Support for writing Computation code in Python. We add Jython bindings so that 
the Python computation code can communicate back with the Java Giraph classes.

To make this work I had to change a few parts of Giraph:
1) The Jython computation is not known until we read the script and create a 
Computation object for it at runtime. This has to be done on each worker 
separately after the job has launched. Because of this, there is no Computation 
class set at the beginning. I suspect other scripting languages will have 
similar issue. To fix this I created a ComputationFactory interface which is 
responsible for creating the Computation, with a default that just grabs the 
class from the Configuration and creates it.
2) I created a GiraphTypes class to hold the I,V,E,M1,M2 classes. There was a 
lot of repetitive code around these things so centralizing it all in one place 
made things a lot cleaner.
3) I added some more helpers like isDefaultValue() to our conf options. Also 
added EnumConfOption.
4) The ReflectionUtils type inference was broken for interfaces. I fixed it by 
putting in TypeTools, a library that does it better.
5) I added a TypesHolder interface (with help of [4]) that people can extend to 
describe types used. Computation implements this. I use this with Jython so 
that user can provide something that describes types but without requiring any 
methods.
6) Fixed GraphConfigurationValidator with interfaces and cleaned it up.

To use Jython all the user has to do is call JythonUtils#init(...) somewhere in 
his initialization.
I also added it to GiraphRunner. To use it through that you give an HDFS path 
to the python file as the Computation. It takes a little more work because you 
need to also supply the new options --typesHolder and --jythonClass.

This patch contains our page rank benchmark implementation in Jython. I added 
an option (--jython) which chooses whether to run the default or the jython 
version.

Here is the initial PageRankBenchmark comparison (200 workers, 1B vertices, 200 
edges per vertex):

Java:
Total (milliseconds)    1,702,429       0       1,702,429
Superstep 3 (milliseconds)      316,844 0       316,844
Setup (milliseconds)    13,226  0       13,226
Shutdown (milliseconds) 113     0       113
Superstep 0 (milliseconds)      300,950 0       300,950
Superstep 4 (milliseconds)      318,627 0       318,627
Input superstep (milliseconds)  114,673 0       114,673
Superstep 5 (milliseconds)      7,898   0       7,898
Superstep 2 (milliseconds)      312,152 0       312,152
Superstep 1 (milliseconds)      317,942 0       317,942

Jython:
Total (milliseconds)    2,123,228       0       2,123,228
Superstep 3 (milliseconds)      406,422 0       406,422
Setup (milliseconds)    7,159   0       7,159
Shutdown (milliseconds) 131     0       131
Superstep 0 (milliseconds)      347,732 0       347,732
Superstep 4 (milliseconds)      405,696 0       405,696
Input superstep (milliseconds)  112,645 0       112,645
Superstep 5 (milliseconds)      46,687  0       46,687
Superstep 2 (milliseconds)      410,349 0       410,349
Superstep 1 (milliseconds)      386,404 0       386,404

That's a mere 25% overhead.

Take a look at the reviewboard for latest patch: 
https://reviews.apache.org/r/11709/

  was:
Support for writing Computation code in Python. We add Jython bindings so that 
the Python computation code can communicate back with the Java Giraph classes.

To make this work I had to change a few parts of Giraph:
1) The Jython computation is not known until we read the script and create a 
Computation object for it at runtime. This has to be done on each worker 
separately after the job has launched. Because of this, there is no Computation 
class set at the beginning. I suspect other scripting languages will have 
similar issue. To fix this I created a ComputationFactory interface which is 
responsible for creating the Computation, with a default that just grabs the 
class from the Configuration and creates it.
2) I created a GiraphTypes class to hold the I,V,E,M1,M2 classes. There was a 
lot of repetitive code around these things so centralizing it all in one place 
made things a lot cleaner.
3) I added some more helpers like isDefaultValue() to our conf options. Also 
added EnumConfOption.
4) The ReflectionUtils type inference was broken for interfaces. I fixed it by 
putting in TypeTools, a library that does it better.
5) I added a TypesHolder interface (with help of [4]) that people can extend to 
describe types used. Computation implements this. I use this with Jython so 
that user can provide something that describes types but without requiring any 
methods.
6) Fixed GraphConfigurationValidator with interfaces and cleaned it up.

To use Jython all the user has to do is call Jython#init(...) somewhere in his 
initialization.

This patch contains our page rank benchmark implementation in Jython. I added 
an option (--jython) which chooses whether to run the default or the jython 
version.

Here is the initial PageRankBenchmark comparison (200 workers, 1B vertices, 200 
edges per vertex):

Java:
Total (milliseconds)    1,702,429       0       1,702,429
Superstep 3 (milliseconds)      316,844 0       316,844
Setup (milliseconds)    13,226  0       13,226
Shutdown (milliseconds) 113     0       113
Superstep 0 (milliseconds)      300,950 0       300,950
Superstep 4 (milliseconds)      318,627 0       318,627
Input superstep (milliseconds)  114,673 0       114,673
Superstep 5 (milliseconds)      7,898   0       7,898
Superstep 2 (milliseconds)      312,152 0       312,152
Superstep 1 (milliseconds)      317,942 0       317,942

Jython:
Total (milliseconds)    2,123,228       0       2,123,228
Superstep 3 (milliseconds)      406,422 0       406,422
Setup (milliseconds)    7,159   0       7,159
Shutdown (milliseconds) 131     0       131
Superstep 0 (milliseconds)      347,732 0       347,732
Superstep 4 (milliseconds)      405,696 0       405,696
Input superstep (milliseconds)  112,645 0       112,645
Superstep 5 (milliseconds)      46,687  0       46,687
Superstep 2 (milliseconds)      410,349 0       410,349
Superstep 1 (milliseconds)      386,404 0       386,404

That's a mere 25% overhead.

Take a look at the reviewboard for latest patch: 
https://reviews.apache.org/r/11709/

    
> Jython for Computation
> ----------------------
>
>                 Key: GIRAPH-683
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-683
>             Project: Giraph
>          Issue Type: Bug
>            Reporter: Nitay Joffe
>            Assignee: Nitay Joffe
>
> Support for writing Computation code in Python. We add Jython bindings so 
> that the Python computation code can communicate back with the Java Giraph 
> classes.
> To make this work I had to change a few parts of Giraph:
> 1) The Jython computation is not known until we read the script and create a 
> Computation object for it at runtime. This has to be done on each worker 
> separately after the job has launched. Because of this, there is no 
> Computation class set at the beginning. I suspect other scripting languages 
> will have similar issue. To fix this I created a ComputationFactory interface 
> which is responsible for creating the Computation, with a default that just 
> grabs the class from the Configuration and creates it.
> 2) I created a GiraphTypes class to hold the I,V,E,M1,M2 classes. There was a 
> lot of repetitive code around these things so centralizing it all in one 
> place made things a lot cleaner.
> 3) I added some more helpers like isDefaultValue() to our conf options. Also 
> added EnumConfOption.
> 4) The ReflectionUtils type inference was broken for interfaces. I fixed it 
> by putting in TypeTools, a library that does it better.
> 5) I added a TypesHolder interface (with help of [4]) that people can extend 
> to describe types used. Computation implements this. I use this with Jython 
> so that user can provide something that describes types but without requiring 
> any methods.
> 6) Fixed GraphConfigurationValidator with interfaces and cleaned it up.
> To use Jython all the user has to do is call JythonUtils#init(...) somewhere 
> in his initialization.
> I also added it to GiraphRunner. To use it through that you give an HDFS path 
> to the python file as the Computation. It takes a little more work because 
> you need to also supply the new options --typesHolder and --jythonClass.
> This patch contains our page rank benchmark implementation in Jython. I added 
> an option (--jython) which chooses whether to run the default or the jython 
> version.
> Here is the initial PageRankBenchmark comparison (200 workers, 1B vertices, 
> 200 edges per vertex):
> Java:
> Total (milliseconds)  1,702,429       0       1,702,429
> Superstep 3 (milliseconds)    316,844 0       316,844
> Setup (milliseconds)  13,226  0       13,226
> Shutdown (milliseconds)       113     0       113
> Superstep 0 (milliseconds)    300,950 0       300,950
> Superstep 4 (milliseconds)    318,627 0       318,627
> Input superstep (milliseconds)        114,673 0       114,673
> Superstep 5 (milliseconds)    7,898   0       7,898
> Superstep 2 (milliseconds)    312,152 0       312,152
> Superstep 1 (milliseconds)    317,942 0       317,942
> Jython:
> Total (milliseconds)  2,123,228       0       2,123,228
> Superstep 3 (milliseconds)    406,422 0       406,422
> Setup (milliseconds)  7,159   0       7,159
> Shutdown (milliseconds)       131     0       131
> Superstep 0 (milliseconds)    347,732 0       347,732
> Superstep 4 (milliseconds)    405,696 0       405,696
> Input superstep (milliseconds)        112,645 0       112,645
> Superstep 5 (milliseconds)    46,687  0       46,687
> Superstep 2 (milliseconds)    410,349 0       410,349
> Superstep 1 (milliseconds)    386,404 0       386,404
> That's a mere 25% overhead.
> Take a look at the reviewboard for latest patch: 
> https://reviews.apache.org/r/11709/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to