----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/9278/ -----------------------------------------------------------
(Updated Feb. 3, 2013, 2:54 p.m.) Review request for giraph. Description ------- Currently, the out-of-core partitions are assigned to memory or to disk statically. Using an LRU cache should help keeping in-memory only the partitions that are actively accessed, given a job that does not access all the graph at each superstep (traversals) and a good data partitioning (non random). Diffs (updated) ----- giraph-core/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java 30d4462 giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyWorkerServer.java e2866fd giraph-core/src/main/java/org/apache/giraph/graph/ComputeCallable.java 042fd47 giraph-core/src/main/java/org/apache/giraph/partition/DiskBackedPartitionStore.java 09e5d75 giraph-core/src/main/java/org/apache/giraph/partition/PartitionStore.java 3e8dda9 giraph-core/src/main/java/org/apache/giraph/partition/SimplePartitionStore.java 7bd0bb1 giraph-core/src/main/java/org/apache/giraph/worker/BspServiceWorker.java f542344 giraph-core/src/test/java/org/apache/giraph/comm/RequestTest.java 7187928 giraph-core/src/test/java/org/apache/giraph/partition/TestPartitionStores.java b02ed3a Diff: https://reviews.apache.org/r/9278/diff/ Testing ------- passes mvn verify. hadoop jar giraph-0.2-SNAPSHOT-for-hadoop-1.0.2-jar-with-dependencies.jar org.apache.giraph.benchmark.PageRankBenchmark -w 60 -c 2 -e 100 -V 10000000 -v -s 10 trunk: 13/01/29 20:40:53 INFO mapred.JobClient: Giraph Timers 13/01/29 20:40:53 INFO mapred.JobClient: Total (milliseconds)=492403 13/01/29 20:40:53 INFO mapred.JobClient: Superstep 3 (milliseconds)=40243 13/01/29 20:40:53 INFO mapred.JobClient: Superstep 4 (milliseconds)=45430 13/01/29 20:40:53 INFO mapred.JobClient: Superstep 10 (milliseconds)=713 13/01/29 20:40:53 INFO mapred.JobClient: Setup (milliseconds)=20832 13/01/29 20:40:53 INFO mapred.JobClient: Shutdown (milliseconds)=56 13/01/29 20:40:53 INFO mapred.JobClient: Superstep 7 (milliseconds)=36753 13/01/29 20:40:53 INFO mapred.JobClient: Superstep 9 (milliseconds)=36363 13/01/29 20:40:53 INFO mapred.JobClient: Superstep 0 (milliseconds)=39558 13/01/29 20:40:53 INFO mapred.JobClient: Superstep 8 (milliseconds)=44548 13/01/29 20:40:53 INFO mapred.JobClient: Input superstep (milliseconds)=59184 13/01/29 20:40:53 INFO mapred.JobClient: Superstep 6 (milliseconds)=40777 13/01/29 20:40:53 INFO mapred.JobClient: Superstep 5 (milliseconds)=43962 13/01/29 20:40:53 INFO mapred.JobClient: Superstep 2 (milliseconds)=37325 13/01/29 20:40:53 INFO mapred.JobClient: Superstep 1 (milliseconds)=46655 13/01/29 20:40:53 INFO mapred.JobClient: Giraph Stats 13/01/29 20:40:53 INFO mapred.JobClient: Aggregate edges=1000000000 13/01/29 20:40:53 INFO mapred.JobClient: Superstep=11 13/01/29 20:40:53 INFO mapred.JobClient: Last checkpointed superstep=0 13/01/29 20:40:53 INFO mapred.JobClient: Current workers=60 13/01/29 20:40:53 INFO mapred.JobClient: Current master task partition=0 13/01/29 20:40:53 INFO mapred.JobClient: Sent messages=0 13/01/29 20:40:53 INFO mapred.JobClient: Aggregate finished vertices=10000000 13/01/29 20:40:53 INFO mapred.JobClient: Aggregate vertices=10000000 13/01/29 20:40:53 INFO mapred.JobClient: File Output Format Counters 13/01/29 20:40:53 INFO mapred.JobClient: Bytes Written=0 13/01/29 20:40:53 INFO mapred.JobClient: FileSystemCounters 13/01/29 20:40:53 INFO mapred.JobClient: HDFS_BYTES_READ=2684 13/01/29 20:40:53 INFO mapred.JobClient: FILE_BYTES_WRITTEN=1388228 13/01/29 20:40:53 INFO mapred.JobClient: File Input Format Counters 13/01/29 20:40:53 INFO mapred.JobClient: Bytes Read=0 13/01/29 20:40:53 INFO mapred.JobClient: Map-Reduce Framework 13/01/29 20:40:53 INFO mapred.JobClient: Map input records=61 13/01/29 20:40:53 INFO mapred.JobClient: Physical memory (bytes) snapshot=71703965696 13/01/29 20:40:53 INFO mapred.JobClient: Spilled Records=0 13/01/29 20:40:53 INFO mapred.JobClient: CPU time spent (ms)=15141630 13/01/29 20:40:53 INFO mapred.JobClient: Total committed heap usage (bytes)=58151337984 13/01/29 20:40:53 INFO mapred.JobClient: Virtual memory (bytes) snapshot=371313995776 13/01/29 20:40:53 INFO mapred.JobClient: Map output records=0 13/01/29 20:40:53 INFO mapred.JobClient: SPLIT_RAW_BYTES=2684 GIRAPH-439: in memory: 13/01/29 19:35:53 INFO mapred.JobClient: Giraph Timers 13/01/29 19:35:53 INFO mapred.JobClient: Total (milliseconds)=427511 13/01/29 19:35:53 INFO mapred.JobClient: Superstep 3 (milliseconds)=37341 13/01/29 19:35:53 INFO mapred.JobClient: Superstep 4 (milliseconds)=35458 13/01/29 19:35:53 INFO mapred.JobClient: Superstep 10 (milliseconds)=852 13/01/29 19:35:53 INFO mapred.JobClient: Setup (milliseconds)=24825 13/01/29 19:35:53 INFO mapred.JobClient: Shutdown (milliseconds)=50 13/01/29 19:35:53 INFO mapred.JobClient: Superstep 7 (milliseconds)=37557 13/01/29 19:35:53 INFO mapred.JobClient: Superstep 9 (milliseconds)=33961 13/01/29 19:35:53 INFO mapred.JobClient: Superstep 0 (milliseconds)=33048 13/01/29 19:35:53 INFO mapred.JobClient: Superstep 8 (milliseconds)=36345 13/01/29 19:35:53 INFO mapred.JobClient: Input superstep (milliseconds)=44420 13/01/29 19:35:53 INFO mapred.JobClient: Superstep 6 (milliseconds)=33635 13/01/29 19:35:53 INFO mapred.JobClient: Superstep 5 (milliseconds)=41885 13/01/29 19:35:53 INFO mapred.JobClient: Superstep 2 (milliseconds)=35046 13/01/29 19:35:53 INFO mapred.JobClient: Superstep 1 (milliseconds)=33083 13/01/29 19:35:53 INFO mapred.JobClient: Giraph Stats 13/01/29 19:35:53 INFO mapred.JobClient: Aggregate edges=1000000000 13/01/29 19:35:53 INFO mapred.JobClient: Superstep=11 13/01/29 19:35:53 INFO mapred.JobClient: Last checkpointed superstep=0 13/01/29 19:35:53 INFO mapred.JobClient: Current workers=60 13/01/29 19:35:53 INFO mapred.JobClient: Current master task partition=0 13/01/29 19:35:53 INFO mapred.JobClient: Sent messages=0 13/01/29 19:35:53 INFO mapred.JobClient: Aggregate finished vertices=10000000 13/01/29 19:35:53 INFO mapred.JobClient: Aggregate vertices=10000000 13/01/29 19:35:53 INFO mapred.JobClient: File Output Format Counters 13/01/29 19:35:53 INFO mapred.JobClient: Bytes Written=0 13/01/29 19:35:53 INFO mapred.JobClient: FileSystemCounters 13/01/29 19:35:53 INFO mapred.JobClient: HDFS_BYTES_READ=2684 13/01/29 19:35:53 INFO mapred.JobClient: FILE_BYTES_WRITTEN=1388228 13/01/29 19:35:53 INFO mapred.JobClient: File Input Format Counters 13/01/29 19:35:53 INFO mapred.JobClient: Bytes Read=0 13/01/29 19:35:53 INFO mapred.JobClient: Map-Reduce Framework 13/01/29 19:35:53 INFO mapred.JobClient: Map input records=61 13/01/29 19:35:53 INFO mapred.JobClient: Physical memory (bytes) snapshot=71627419648 13/01/29 19:35:53 INFO mapred.JobClient: Spilled Records=0 13/01/29 19:35:53 INFO mapred.JobClient: CPU time spent (ms)=15020990 13/01/29 19:35:53 INFO mapred.JobClient: Total committed heap usage (bytes)=57611911168 13/01/29 19:35:53 INFO mapred.JobClient: Virtual memory (bytes) snapshot=371123154944 13/01/29 19:35:53 INFO mapred.JobClient: Map output records=0 13/01/29 19:35:53 INFO mapred.JobClient: SPLIT_RAW_BYTES=2684 ooh graph (2 partitions in memory out of 49): 13/01/29 19:54:57 INFO mapred.JobClient: Giraph Timers 13/01/29 19:54:57 INFO mapred.JobClient: Total (milliseconds)=508004 13/01/29 19:54:57 INFO mapred.JobClient: Superstep 3 (milliseconds)=38085 13/01/29 19:54:57 INFO mapred.JobClient: Superstep 4 (milliseconds)=40789 13/01/29 19:54:57 INFO mapred.JobClient: Superstep 10 (milliseconds)=811 13/01/29 19:54:57 INFO mapred.JobClient: Setup (milliseconds)=25612 13/01/29 19:54:57 INFO mapred.JobClient: Shutdown (milliseconds)=699 13/01/29 19:54:57 INFO mapred.JobClient: Superstep 7 (milliseconds)=44806 13/01/29 19:54:57 INFO mapred.JobClient: Superstep 9 (milliseconds)=41873 13/01/29 19:54:57 INFO mapred.JobClient: Superstep 0 (milliseconds)=46329 13/01/29 19:54:57 INFO mapred.JobClient: Superstep 8 (milliseconds)=46272 13/01/29 19:54:57 INFO mapred.JobClient: Input superstep (milliseconds)=52395 13/01/29 19:54:57 INFO mapred.JobClient: Superstep 6 (milliseconds)=44337 13/01/29 19:54:57 INFO mapred.JobClient: Superstep 5 (milliseconds)=39379 13/01/29 19:54:57 INFO mapred.JobClient: Superstep 2 (milliseconds)=40452 13/01/29 19:54:57 INFO mapred.JobClient: Superstep 1 (milliseconds)=46155 13/01/29 19:54:57 INFO mapred.JobClient: Giraph Stats 13/01/29 19:54:57 INFO mapred.JobClient: Aggregate edges=1000000000 13/01/29 19:54:57 INFO mapred.JobClient: Superstep=11 13/01/29 19:54:57 INFO mapred.JobClient: Last checkpointed superstep=0 13/01/29 19:54:57 INFO mapred.JobClient: Current workers=60 13/01/29 19:54:57 INFO mapred.JobClient: Current master task partition=0 13/01/29 19:54:57 INFO mapred.JobClient: Sent messages=0 13/01/29 19:54:57 INFO mapred.JobClient: Aggregate finished vertices=10000000 13/01/29 19:54:57 INFO mapred.JobClient: Aggregate vertices=10000000 13/01/29 19:54:57 INFO mapred.JobClient: File Output Format Counters 13/01/29 19:54:57 INFO mapred.JobClient: Bytes Written=0 13/01/29 19:54:57 INFO mapred.JobClient: FileSystemCounters 13/01/29 19:54:57 INFO mapred.JobClient: HDFS_BYTES_READ=2684 13/01/29 19:54:57 INFO mapred.JobClient: FILE_BYTES_WRITTEN=1388228 13/01/29 19:54:57 INFO mapred.JobClient: File Input Format Counters 13/01/29 19:54:57 INFO mapred.JobClient: Bytes Read=0 13/01/29 19:54:57 INFO mapred.JobClient: Map-Reduce Framework 13/01/29 19:54:57 INFO mapred.JobClient: Map input records=61 13/01/29 19:54:57 INFO mapred.JobClient: Physical memory (bytes) snapshot=71368736768 13/01/29 19:54:57 INFO mapred.JobClient: Spilled Records=0 13/01/29 19:54:57 INFO mapred.JobClient: CPU time spent (ms)=15289390 13/01/29 19:54:57 INFO mapred.JobClient: Total committed heap usage (bytes)=57278595072 13/01/29 19:54:57 INFO mapred.JobClient: Virtual memory (bytes) snapshot=370911342592 13/01/29 19:54:57 INFO mapred.JobClient: Map output records=0 13/01/29 19:54:57 INFO mapred.JobClient: SPLIT_RAW_BYTES=2684 in memory (2 compute threads per worker): 13/01/29 20:30:49 INFO mapred.JobClient: Giraph Timers 13/01/29 20:30:49 INFO mapred.JobClient: Total (milliseconds)=487379 13/01/29 20:30:49 INFO mapred.JobClient: Superstep 3 (milliseconds)=46092 13/01/29 20:30:49 INFO mapred.JobClient: Superstep 4 (milliseconds)=44840 13/01/29 20:30:49 INFO mapred.JobClient: Superstep 10 (milliseconds)=745 13/01/29 20:30:49 INFO mapred.JobClient: Setup (milliseconds)=23013 13/01/29 20:30:49 INFO mapred.JobClient: Shutdown (milliseconds)=126 13/01/29 20:30:49 INFO mapred.JobClient: Superstep 7 (milliseconds)=40620 13/01/29 20:30:49 INFO mapred.JobClient: Superstep 9 (milliseconds)=39630 13/01/29 20:30:49 INFO mapred.JobClient: Superstep 0 (milliseconds)=38221 13/01/29 20:30:49 INFO mapred.JobClient: Superstep 8 (milliseconds)=40406 13/01/29 20:30:49 INFO mapred.JobClient: Input superstep (milliseconds)=49762 13/01/29 20:30:49 INFO mapred.JobClient: Superstep 6 (milliseconds)=45054 13/01/29 20:30:49 INFO mapred.JobClient: Superstep 5 (milliseconds)=40220 13/01/29 20:30:49 INFO mapred.JobClient: Superstep 2 (milliseconds)=40817 13/01/29 20:30:49 INFO mapred.JobClient: Superstep 1 (milliseconds)=37830 13/01/29 20:30:49 INFO mapred.JobClient: Giraph Stats 13/01/29 20:30:49 INFO mapred.JobClient: Aggregate edges=1000000000 13/01/29 20:30:49 INFO mapred.JobClient: Superstep=11 13/01/29 20:30:49 INFO mapred.JobClient: Last checkpointed superstep=0 13/01/29 20:30:49 INFO mapred.JobClient: Current workers=60 13/01/29 20:30:49 INFO mapred.JobClient: Current master task partition=0 13/01/29 20:30:49 INFO mapred.JobClient: Sent messages=0 13/01/29 20:30:49 INFO mapred.JobClient: Aggregate finished vertices=10000000 13/01/29 20:30:49 INFO mapred.JobClient: Aggregate vertices=10000000 13/01/29 20:30:49 INFO mapred.JobClient: File Output Format Counters 13/01/29 20:30:49 INFO mapred.JobClient: Bytes Written=0 13/01/29 20:30:49 INFO mapred.JobClient: FileSystemCounters 13/01/29 20:30:49 INFO mapred.JobClient: HDFS_BYTES_READ=2684 13/01/29 20:30:49 INFO mapred.JobClient: FILE_BYTES_WRITTEN=1388228 13/01/29 20:30:49 INFO mapred.JobClient: File Input Format Counters 13/01/29 20:30:49 INFO mapred.JobClient: Bytes Read=0 13/01/29 20:30:49 INFO mapred.JobClient: Map-Reduce Framework 13/01/29 20:30:49 INFO mapred.JobClient: Map input records=61 13/01/29 20:30:49 INFO mapred.JobClient: Physical memory (bytes) snapshot=71895678976 13/01/29 20:30:49 INFO mapred.JobClient: Spilled Records=0 13/01/29 20:30:49 INFO mapred.JobClient: CPU time spent (ms)=15134650 13/01/29 20:30:49 INFO mapred.JobClient: Total committed heap usage (bytes)=57982255104 13/01/29 20:30:49 INFO mapred.JobClient: Virtual memory (bytes) snapshot=371448213504 13/01/29 20:30:49 INFO mapred.JobClient: Map output records=0 13/01/29 20:30:49 INFO mapred.JobClient: SPLIT_RAW_BYTES=2684 ooh graph (2 partitions in memory out of 49, 2 compute threads per worker): 13/01/29 20:11:28 INFO mapred.JobClient: Giraph Timers 13/01/29 20:11:28 INFO mapred.JobClient: Total (milliseconds)=506380 13/01/29 20:11:28 INFO mapred.JobClient: Superstep 3 (milliseconds)=41677 13/01/29 20:11:28 INFO mapred.JobClient: Superstep 4 (milliseconds)=41285 13/01/29 20:11:28 INFO mapred.JobClient: Superstep 10 (milliseconds)=764 13/01/29 20:11:28 INFO mapred.JobClient: Setup (milliseconds)=24574 13/01/29 20:11:28 INFO mapred.JobClient: Shutdown (milliseconds)=82 13/01/29 20:11:28 INFO mapred.JobClient: Superstep 7 (milliseconds)=43183 13/01/29 20:11:28 INFO mapred.JobClient: Superstep 9 (milliseconds)=46654 13/01/29 20:11:28 INFO mapred.JobClient: Superstep 0 (milliseconds)=50955 13/01/29 20:11:28 INFO mapred.JobClient: Superstep 8 (milliseconds)=40413 13/01/29 20:11:28 INFO mapred.JobClient: Input superstep (milliseconds)=43584 13/01/29 20:11:28 INFO mapred.JobClient: Superstep 6 (milliseconds)=46638 13/01/29 20:11:28 INFO mapred.JobClient: Superstep 5 (milliseconds)=46107 13/01/29 20:11:28 INFO mapred.JobClient: Superstep 2 (milliseconds)=39321 13/01/29 20:11:28 INFO mapred.JobClient: Superstep 1 (milliseconds)=41139 13/01/29 20:11:28 INFO mapred.JobClient: Giraph Stats 13/01/29 20:11:28 INFO mapred.JobClient: Aggregate edges=1000000000 13/01/29 20:11:28 INFO mapred.JobClient: Superstep=11 13/01/29 20:11:28 INFO mapred.JobClient: Last checkpointed superstep=0 13/01/29 20:11:28 INFO mapred.JobClient: Current workers=60 13/01/29 20:11:28 INFO mapred.JobClient: Current master task partition=0 13/01/29 20:11:28 INFO mapred.JobClient: Sent messages=0 13/01/29 20:11:28 INFO mapred.JobClient: Aggregate finished vertices=10000000 13/01/29 20:11:28 INFO mapred.JobClient: Aggregate vertices=10000000 13/01/29 20:11:28 INFO mapred.JobClient: File Output Format Counters 13/01/29 20:11:28 INFO mapred.JobClient: Bytes Written=0 13/01/29 20:11:28 INFO mapred.JobClient: FileSystemCounters 13/01/29 20:11:28 INFO mapred.JobClient: HDFS_BYTES_READ=2684 13/01/29 20:11:28 INFO mapred.JobClient: FILE_BYTES_WRITTEN=1388228 13/01/29 20:11:28 INFO mapred.JobClient: File Input Format Counters 13/01/29 20:11:28 INFO mapred.JobClient: Bytes Read=0 13/01/29 20:11:28 INFO mapred.JobClient: Map-Reduce Framework 13/01/29 20:11:28 INFO mapred.JobClient: Map input records=61 13/01/29 20:11:28 INFO mapred.JobClient: Physical memory (bytes) snapshot=71620620288 13/01/29 20:11:28 INFO mapred.JobClient: Spilled Records=0 13/01/29 20:11:28 INFO mapred.JobClient: CPU time spent (ms)=15279810 13/01/29 20:11:28 INFO mapred.JobClient: Total committed heap usage (bytes)=57294782464 13/01/29 20:11:28 INFO mapred.JobClient: Virtual memory (bytes) snapshot=370988941312 13/01/29 20:11:28 INFO mapred.JobClient: Map output records=0 13/01/29 20:11:28 INFO mapred.JobClient: SPLIT_RAW_BYTES=2684 Thanks, Claudio Martella
