-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15142/#review27948
-----------------------------------------------------------

Ship it!


+1, this is awesome work Maja and will fail faster due to metastore issues and 
also cut back on metastore accesses.  Yay!


giraph-hive/src/main/java/org/apache/giraph/hive/HiveGiraphRunner.java
<https://reviews.apache.org/r/15142/#comment54396>

    Maybe worth adding a top level comment for this method that says something 
like:
    For all Hive vertex inputs, add the user settings to the configuration.  
Additionally, this checks the input specs for every input which caches metadata 
access into the configuration to eliminate worker access to the metastore and 
fail earlier in the case that metadata doesn't exist.  In the case of multiple 
vertex input descriptions, metadata is cached in each vertex input format 
description and then saved into a single Configuration via JSON.



giraph-hive/src/main/java/org/apache/giraph/hive/HiveGiraphRunner.java
<https://reviews.apache.org/r/15142/#comment54399>

    Maybe worth adding a top level comment for this method that says something 
like:
    For all Hive edge inputs, add the user settings to the configuration.  
Additionally, this checks the input specs for every input which caches metadata 
access into the configuration to eliminate worker access to the metastore and 
fail earlier in the case that metadata doesn't exist.  In the case of multiple 
edge input descriptions, metadata is cached in each vertex input format 
description and then saved into a single Configuration via JSON.


- Avery Ching


On Oct. 31, 2013, 6:43 p.m., Maja Kabiljo wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/15142/
> -----------------------------------------------------------
> 
> (Updated Oct. 31, 2013, 6:43 p.m.)
> 
> 
> Review request for giraph.
> 
> 
> Bugs: GIRAPH-789
>     https://issues.apache.org/jira/browse/GIRAPH-789
> 
> 
> Repository: giraph-git
> 
> 
> Description
> -------
> 
> Currently each worker is sending multiple requests to metastore to get info 
> about io formats, which is unnecessary and can cause issues when metastore is 
> having problems.
> 
> Hive-io changed so it doesn't access metastore when schema/table info is 
> already present in Configuration, and HiveGiraphRunner is now initializing 
> all the formats to fill up the Configuration. If HiveGiraphRunner is not used 
> everything will still work, but we'll have accesses to metastore from workers.
> 
> 
> Diffs
> -----
> 
>   giraph-hive/src/main/java/org/apache/giraph/hive/HiveGiraphRunner.java 
> 6b8a8e9 
>   giraph-hive/src/main/java/org/apache/giraph/hive/common/HiveUtils.java 
> b809413 
>   
> giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/HiveEdgeInputFormat.java
>  534a773 
>   
> giraph-hive/src/main/java/org/apache/giraph/hive/input/vertex/HiveVertexInputFormat.java
>  d5c1279 
>   
> giraph-hive/src/main/java/org/apache/giraph/hive/output/HiveVertexOutputFormat.java
>  c4813fb 
>   pom.xml f2981ff 
> 
> Diff: https://reviews.apache.org/r/15142/diff/
> 
> 
> Testing
> -------
> 
> mvn clean verify
> 
> Run jobs with single and multiple input formats, with added logging for each 
> metastore call in hive-io. For example in case when we have single vertex and 
> edge input and output, we'll have none instead of 8 metastore calls from each 
> worker. The number of calls from master is also reduced - we are only getting 
> input partition descriptions in the beginning of the job and have no calls in 
> the end (for output). The only call left in the end is from cleanup task to 
> register new partition. Clean up task used to have two additional calls which 
> are also removed.
> 
> 
> Thanks,
> 
> Maja Kabiljo
> 
>

Reply via email to