-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15142/
-----------------------------------------------------------
Review request for giraph.
Bugs: GIRAPH-789
https://issues.apache.org/jira/browse/GIRAPH-789
Repository: giraph-git
Description
-------
Currently each worker is sending multiple requests to metastore to get info
about io formats, which is unnecessary and can cause issues when metastore is
having problems.
Hive-io changed so it doesn't access metastore when schema/table info is
already present in Configuration, and HiveGiraphRunner is now initializing all
the formats to fill up the Configuration. If HiveGiraphRunner is not used
everything will still work, but we'll have accesses to metastore from workers.
Diffs
-----
giraph-hive/src/main/java/org/apache/giraph/hive/HiveGiraphRunner.java
6b8a8e9
giraph-hive/src/main/java/org/apache/giraph/hive/common/HiveUtils.java
b809413
giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/HiveEdgeInputFormat.java
534a773
giraph-hive/src/main/java/org/apache/giraph/hive/input/vertex/HiveVertexInputFormat.java
d5c1279
giraph-hive/src/main/java/org/apache/giraph/hive/output/HiveVertexOutputFormat.java
c4813fb
pom.xml f2981ff
Diff: https://reviews.apache.org/r/15142/diff/
Testing
-------
mvn clean verify
Run jobs with single and multiple input formats, with added logging for each
metastore call in hive-io. For example in case when we have single vertex and
edge input and output, we'll have none instead of 8 metastore calls from each
worker. The number of calls from master is also reduced - we are only getting
input partition descriptions in the beginning of the job and have no calls in
the end (for output). The only call left in the end is from cleanup task to
register new partition. Clean up task used to have two additional calls which
are also removed.
Thanks,
Maja Kabiljo