[
https://issues.apache.org/jira/browse/GIRAPH-100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13161379#comment-13161379
]
[email protected] commented on GIRAPH-100:
------------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2959/
-----------------------------------------------------------
(Updated 2011-12-02 02:55:14.025295)
Review request for giraph.
Changes
-------
Moved examples/SuperstepHashPartitionerFactory.java to
integration/SuperstepHashPartitionerFactory.java:
Added a few context.progress() to the communication cycle to avoid task
timeouts.
Summary
-------
Got rid of ZooKeeper message for node created on the input split reservation.
Adding some features for debugging:
- Taking only a % of the input splits
- Taking a maximum number of vertices in an input split
Added master status update for number of workers have responded.
Workers will output some information about how the % of input splits that have
been completed.
Fixed a bug where a forced flush of cached vertices in the input split was
happening per input split rather than at the end of processing all input
splits. This requires an additional barrier after processing all the input
splits to allow for the final flush of the cached vertices.
Factored out barrierOnWorkerList to reuse the barrier code coordination by the
master.
Factored out markInputSplitPathFinished to make the code a bit cleaner.
Clearing out the transientInMessages and inMessages maps to reduce processing
time.
Changed the default partition count multipler to produce n^2 partitions rather
than 0.5xn^2 for better balancing when the maximum limit is not exceeded.
Changed SimpleCheckpointVertex to throw an Exception instead of System.exit(-1)
for a faster failure (seconds instead of minutes).
Moved SuperstepHashPartitionerFactory to the examples directory. If it is not
there, the test against a real Hadoop instance will fail from
ClassNotFoundException.
This addresses bug GIRAPH-100.
https://issues.apache.org/jira/browse/GIRAPH-100
Diffs (updated)
-----
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/bsp/CentralizedServiceMaster.java
1209336
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java
1209336
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspService.java
1209336
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceMaster.java
1209336
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java
1209336
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GiraphJob.java
1209336
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/HashMasterPartitioner.java
1209336
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/integration/SuperstepHashPartitionerFactory.java
PRE-CREATION
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/lib/IdWithValueTextOutputFormat.java
1209336
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/lib/TextVertexInputFormat.java
1209336
http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/TestGraphPartitioner.java
1209336
Diff: https://reviews.apache.org/r/2959/diff
Testing
-------
Passed local and Hadoop instance unittests. Ran PageRankBenchmark on a real
Hadoop cluster.
Thanks,
Avery
> Data input sampling and testing improvements
> --------------------------------------------
>
> Key: GIRAPH-100
> URL: https://issues.apache.org/jira/browse/GIRAPH-100
> Project: Giraph
> Issue Type: New Feature
> Components: graph
> Reporter: Avery Ching
> Assignee: Avery Ching
> Attachments: GIRAPH-100.2.patch, GIRAPH-100.patch
>
>
> It would be really nice to help debug an application by limiting the input
> data (% of input splits, max vertices per input split). Also, it would be
> nice for the workers to provide a little more debugging info on how far along
> they are with processing the input data.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira