> On Oct. 31, 2013, 8:07 p.m., Avery Ching wrote: > > +1, this is awesome work Maja and will fail faster due to metastore issues > > and also cut back on metastore accesses. Yay!
Thanks for a quick review, added comments and committing! - Maja ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/15142/#review27948 ----------------------------------------------------------- On Oct. 31, 2013, 6:43 p.m., Maja Kabiljo wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/15142/ > ----------------------------------------------------------- > > (Updated Oct. 31, 2013, 6:43 p.m.) > > > Review request for giraph. > > > Bugs: GIRAPH-789 > https://issues.apache.org/jira/browse/GIRAPH-789 > > > Repository: giraph-git > > > Description > ------- > > Currently each worker is sending multiple requests to metastore to get info > about io formats, which is unnecessary and can cause issues when metastore is > having problems. > > Hive-io changed so it doesn't access metastore when schema/table info is > already present in Configuration, and HiveGiraphRunner is now initializing > all the formats to fill up the Configuration. If HiveGiraphRunner is not used > everything will still work, but we'll have accesses to metastore from workers. > > > Diffs > ----- > > giraph-hive/src/main/java/org/apache/giraph/hive/HiveGiraphRunner.java > 6b8a8e9 > giraph-hive/src/main/java/org/apache/giraph/hive/common/HiveUtils.java > b809413 > > giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/HiveEdgeInputFormat.java > 534a773 > > giraph-hive/src/main/java/org/apache/giraph/hive/input/vertex/HiveVertexInputFormat.java > d5c1279 > > giraph-hive/src/main/java/org/apache/giraph/hive/output/HiveVertexOutputFormat.java > c4813fb > pom.xml f2981ff > > Diff: https://reviews.apache.org/r/15142/diff/ > > > Testing > ------- > > mvn clean verify > > Run jobs with single and multiple input formats, with added logging for each > metastore call in hive-io. For example in case when we have single vertex and > edge input and output, we'll have none instead of 8 metastore calls from each > worker. The number of calls from master is also reduced - we are only getting > input partition descriptions in the beginning of the job and have no calls in > the end (for output). The only call left in the end is from cleanup task to > register new partition. Clean up task used to have two additional calls which > are also removed. > > > Thanks, > > Maja Kabiljo > >
