-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15142/
-----------------------------------------------------------

Review request for giraph.


Bugs: GIRAPH-789
    https://issues.apache.org/jira/browse/GIRAPH-789


Repository: giraph-git


Description
-------

Currently each worker is sending multiple requests to metastore to get info 
about io formats, which is unnecessary and can cause issues when metastore is 
having problems.

Hive-io changed so it doesn't access metastore when schema/table info is 
already present in Configuration, and HiveGiraphRunner is now initializing all 
the formats to fill up the Configuration. If HiveGiraphRunner is not used 
everything will still work, but we'll have accesses to metastore from workers.


Diffs
-----

  giraph-hive/src/main/java/org/apache/giraph/hive/HiveGiraphRunner.java 
6b8a8e9 
  giraph-hive/src/main/java/org/apache/giraph/hive/common/HiveUtils.java 
b809413 
  
giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/HiveEdgeInputFormat.java
 534a773 
  
giraph-hive/src/main/java/org/apache/giraph/hive/input/vertex/HiveVertexInputFormat.java
 d5c1279 
  
giraph-hive/src/main/java/org/apache/giraph/hive/output/HiveVertexOutputFormat.java
 c4813fb 
  pom.xml f2981ff 

Diff: https://reviews.apache.org/r/15142/diff/


Testing
-------

mvn clean verify

Run jobs with single and multiple input formats, with added logging for each 
metastore call in hive-io. For example in case when we have single vertex and 
edge input and output, we'll have none instead of 8 metastore calls from each 
worker. The number of calls from master is also reduced - we are only getting 
input partition descriptions in the beginning of the job and have no calls in 
the end (for output). The only call left in the end is from cleanup task to 
register new partition. Clean up task used to have two additional calls which 
are also removed.


Thanks,

Maja Kabiljo

Reply via email to