----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/8611/#review14735 -----------------------------------------------------------
Two macro comments: 1) Why not keep this new input format in giraph-formats-contrib? 2) Some classes are completely undocumented (I only marked a few of those). giraph-hive/src/main/java/org/apache/giraph/hive/HiveGiraphRunner.java <https://reviews.apache.org/r/8611/#comment31241> Nice, one less hack (GiraphHCatInputFormat, I'm looking at you). giraph-hive/src/main/java/org/apache/giraph/hive/HiveGiraphRunner.java <https://reviews.apache.org/r/8611/#comment31242> This sentence looks interrupted... giraph-hive/src/main/java/org/apache/giraph/hive/HiveGiraphRunner.java <https://reviews.apache.org/r/8611/#comment31243> Is this a leftover? giraph-hive/src/main/java/org/apache/giraph/hive/HiveGiraphRunner.java <https://reviews.apache.org/r/8611/#comment31250> Similarly here, shouldn't this be HiveVertexInputFormat? giraph-hive/src/main/java/org/apache/giraph/hive/HiveGiraphRunner.java <https://reviews.apache.org/r/8611/#comment31249> Shouldn't this be the Hive format? I must be missing something... giraph-hive/src/main/java/org/apache/giraph/hive/HiveVertexReader.java <https://reviews.apache.org/r/8611/#comment31252> Should this be "setVertexCreator..."? giraph-hive/src/main/java/org/apache/giraph/hive/HiveVertexReader.java <https://reviews.apache.org/r/8611/#comment31254> I would call this "createVertexCreator" or "instantiateVertexCreator" if you don't like the repetition :P This, to differentiate it from the above method which sets the class, whereas this one instantiate it assuming it's been set. giraph-hive/src/main/java/org/apache/giraph/hive/HiveVertexReader.java <https://reviews.apache.org/r/8611/#comment31255> vertexes -> vertices just for consistency giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveApiInputFormat.java <https://reviews.apache.org/r/8611/#comment31256> Missing Javadoc, Checkstyle should have complained :-) giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveApiOutputFormat.java <https://reviews.apache.org/r/8611/#comment31257> Missing Javadoc. giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveInputDescription.java <https://reviews.apache.org/r/8611/#comment31258> Missing Javadoc. giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveOutputDescription.java <https://reviews.apache.org/r/8611/#comment31259> Javadoc. giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveRecord.java <https://reviews.apache.org/r/8611/#comment31260> Javadoc giraph-hive/src/main/java/org/apache/hadoop/hive/api/common/ProgressReporter.java <https://reviews.apache.org/r/8611/#comment31261> Isn't this redundant, since TaskAttemptContext is a Progressable? - Alessandro Presta On Dec. 15, 2012, 2:02 a.m., Nitay Joffe wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/8611/ > ----------------------------------------------------------- > > (Updated Dec. 15, 2012, 2:02 a.m.) > > > Review request for giraph. > > > Description > ------- > > For now this is only the Input side of things. One particular thing I added > was the concept of "profiles", allowing for easily reading from multiple > tables. This should remove a lot of the cruft around the GiraphHCat* classes. > > Note in the diff I separated the code so that there would be a > Giraph-unrelated Hive-only portion (under package org.apache.hadoop.hive). > Things under this package (and its children) do not touch any Giraph code, > and so can be contributed as an IOFormat back to Hive itself. > > Also note the new (I think improved) interface: Users do not need to actually > implement an XInputFormat anymore. They just create a class the implements > the HiveVertexCreator interface, plug that in, and use HiveVertexInputFormat. > Should make user code much cleaner. > > > This addresses bug GIRAPH-453. > https://issues.apache.org/jira/browse/GIRAPH-453 > > > Diffs > ----- > > giraph-formats-contrib/pom.xml 9326c28420be2749fa724ebd86d9555ae209a2ee > > giraph-formats-contrib/src/main/java/org/apache/giraph/io/hcatalog/HCatGiraphRunner.java > PRE-CREATION > > giraph-formats-contrib/src/main/java/org/apache/giraph/io/hcatalog/HiveGiraphRunner.java > 7a7c2f87bc50eaebe879e2eff2da661b504096b0 > giraph-hive/pom.xml PRE-CREATION > giraph-hive/src/main/assembly/compile.xml PRE-CREATION > giraph-hive/src/main/java/org/apache/giraph/hive/HiveConstants.java > PRE-CREATION > giraph-hive/src/main/java/org/apache/giraph/hive/HiveEdgeInputFormat.java > PRE-CREATION > giraph-hive/src/main/java/org/apache/giraph/hive/HiveEdgeReader.java > PRE-CREATION > giraph-hive/src/main/java/org/apache/giraph/hive/HiveGiraphRunner.java > PRE-CREATION > giraph-hive/src/main/java/org/apache/giraph/hive/HiveUtils.java > PRE-CREATION > giraph-hive/src/main/java/org/apache/giraph/hive/HiveVertexCreator.java > PRE-CREATION > giraph-hive/src/main/java/org/apache/giraph/hive/HiveVertexInputFormat.java > PRE-CREATION > giraph-hive/src/main/java/org/apache/giraph/hive/HiveVertexReader.java > PRE-CREATION > giraph-hive/src/main/java/org/apache/giraph/hive/VertexCreator.java > PRE-CREATION > > giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveApiInputFormat.java > PRE-CREATION > > giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveApiOutputFormat.java > PRE-CREATION > > giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveInputDescription.java > PRE-CREATION > > giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveOutputDescription.java > PRE-CREATION > giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveRecord.java > PRE-CREATION > giraph-hive/src/main/java/org/apache/hadoop/hive/api/common/Classes.java > PRE-CREATION > > giraph-hive/src/main/java/org/apache/hadoop/hive/api/common/HadoopUtils.java > PRE-CREATION > > giraph-hive/src/main/java/org/apache/hadoop/hive/api/common/ProgressReporter.java > PRE-CREATION > giraph-hive/src/main/java/org/apache/hadoop/hive/api/common/SerDes.java > PRE-CREATION > giraph-hive/src/main/java/org/apache/hadoop/hive/api/common/Writables.java > PRE-CREATION > > giraph-hive/src/main/java/org/apache/hadoop/hive/api/input/ApiInputSplit.java > PRE-CREATION > giraph-hive/src/main/java/org/apache/hadoop/hive/api/input/ApiRecord.java > PRE-CREATION > > giraph-hive/src/main/java/org/apache/hadoop/hive/api/input/ApiRecordReader.java > PRE-CREATION > giraph-hive/src/main/java/org/apache/hadoop/hive/api/input/InputConf.java > PRE-CREATION > > giraph-hive/src/main/java/org/apache/hadoop/hive/api/input/InputPartition.java > PRE-CREATION > > giraph-hive/src/main/java/org/apache/hadoop/hive/api/input/InputPartitions.java > PRE-CREATION > > giraph-hive/src/main/java/org/apache/hadoop/hive/api/input/InputSplitData.java > PRE-CREATION > > giraph-hive/src/main/java/org/apache/hadoop/hive/api/inspect/ListParser.java > PRE-CREATION > giraph-hive/src/main/java/org/apache/hadoop/hive/api/inspect/MapParser.java > PRE-CREATION > > giraph-hive/src/main/java/org/apache/hadoop/hive/api/inspect/ObjectParser.java > PRE-CREATION > giraph-hive/src/main/java/org/apache/hadoop/hive/api/inspect/Parsers.java > PRE-CREATION > > giraph-hive/src/main/java/org/apache/hadoop/hive/api/inspect/PrimitiveParser.java > PRE-CREATION > > giraph-hive/src/main/java/org/apache/hadoop/hive/api/inspect/StructParser.java > PRE-CREATION > > giraph-hive/src/main/java/org/apache/hadoop/hive/api/inspect/UnionParser.java > PRE-CREATION > pom.xml 47736351c45f4cdb55df762506069eaed7126a7e > > Diff: https://reviews.apache.org/r/8611/diff/ > > > Testing > ------- > > Ran on some production jobs and verified results were exactly the same. > > In terms of performance this is on par with our current HCatalog stuff. I ran > a few jobs and noticed at most a few seconds of difference between the input > supersteps. Sometimes it was less, so I think the difference is mostly noise. > > > Thanks, > > Nitay Joffe > >
