-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8611/#review14735
-----------------------------------------------------------


Two macro comments:
1) Why not keep this new input format in giraph-formats-contrib?
2) Some classes are completely undocumented (I only marked a few of those). 


giraph-hive/src/main/java/org/apache/giraph/hive/HiveGiraphRunner.java
<https://reviews.apache.org/r/8611/#comment31241>

    Nice, one less hack (GiraphHCatInputFormat, I'm looking at you).



giraph-hive/src/main/java/org/apache/giraph/hive/HiveGiraphRunner.java
<https://reviews.apache.org/r/8611/#comment31242>

    This sentence looks interrupted...



giraph-hive/src/main/java/org/apache/giraph/hive/HiveGiraphRunner.java
<https://reviews.apache.org/r/8611/#comment31243>

    Is this a leftover?



giraph-hive/src/main/java/org/apache/giraph/hive/HiveGiraphRunner.java
<https://reviews.apache.org/r/8611/#comment31250>

    Similarly here, shouldn't this be HiveVertexInputFormat?



giraph-hive/src/main/java/org/apache/giraph/hive/HiveGiraphRunner.java
<https://reviews.apache.org/r/8611/#comment31249>

    Shouldn't this be the Hive format?
    I must be missing something...



giraph-hive/src/main/java/org/apache/giraph/hive/HiveVertexReader.java
<https://reviews.apache.org/r/8611/#comment31252>

    Should this be "setVertexCreator..."?



giraph-hive/src/main/java/org/apache/giraph/hive/HiveVertexReader.java
<https://reviews.apache.org/r/8611/#comment31254>

    I would call this "createVertexCreator" or "instantiateVertexCreator" if 
you don't like the repetition :P
    This, to differentiate it from the above method which sets the class, 
whereas this one instantiate it assuming it's been set.



giraph-hive/src/main/java/org/apache/giraph/hive/HiveVertexReader.java
<https://reviews.apache.org/r/8611/#comment31255>

    vertexes -> vertices just for consistency



giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveApiInputFormat.java
<https://reviews.apache.org/r/8611/#comment31256>

    Missing Javadoc, Checkstyle should have complained :-)



giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveApiOutputFormat.java
<https://reviews.apache.org/r/8611/#comment31257>

    Missing Javadoc.



giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveInputDescription.java
<https://reviews.apache.org/r/8611/#comment31258>

    Missing Javadoc.



giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveOutputDescription.java
<https://reviews.apache.org/r/8611/#comment31259>

    Javadoc.



giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveRecord.java
<https://reviews.apache.org/r/8611/#comment31260>

    Javadoc



giraph-hive/src/main/java/org/apache/hadoop/hive/api/common/ProgressReporter.java
<https://reviews.apache.org/r/8611/#comment31261>

    Isn't this redundant, since TaskAttemptContext is a Progressable?


- Alessandro Presta


On Dec. 15, 2012, 2:02 a.m., Nitay Joffe wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/8611/
> -----------------------------------------------------------
> 
> (Updated Dec. 15, 2012, 2:02 a.m.)
> 
> 
> Review request for giraph.
> 
> 
> Description
> -------
> 
> For now this is only the Input side of things. One particular thing I added 
> was the concept of "profiles", allowing for easily reading from multiple 
> tables. This should remove a lot of the cruft around the GiraphHCat* classes.
> 
> Note in the diff I separated the code so that there would be a 
> Giraph-unrelated Hive-only portion (under package org.apache.hadoop.hive). 
> Things under this package (and its children) do not touch any Giraph code, 
> and so can be contributed as an IOFormat back to Hive itself.
> 
> Also note the new (I think improved) interface: Users do not need to actually 
> implement an XInputFormat anymore. They just create a class the implements 
> the HiveVertexCreator interface, plug that in, and use HiveVertexInputFormat. 
> Should make user code much cleaner.
> 
> 
> This addresses bug GIRAPH-453.
>     https://issues.apache.org/jira/browse/GIRAPH-453
> 
> 
> Diffs
> -----
> 
>   giraph-formats-contrib/pom.xml 9326c28420be2749fa724ebd86d9555ae209a2ee 
>   
> giraph-formats-contrib/src/main/java/org/apache/giraph/io/hcatalog/HCatGiraphRunner.java
>  PRE-CREATION 
>   
> giraph-formats-contrib/src/main/java/org/apache/giraph/io/hcatalog/HiveGiraphRunner.java
>  7a7c2f87bc50eaebe879e2eff2da661b504096b0 
>   giraph-hive/pom.xml PRE-CREATION 
>   giraph-hive/src/main/assembly/compile.xml PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/giraph/hive/HiveConstants.java 
> PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/giraph/hive/HiveEdgeInputFormat.java 
> PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/giraph/hive/HiveEdgeReader.java 
> PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/giraph/hive/HiveGiraphRunner.java 
> PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/giraph/hive/HiveUtils.java 
> PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/giraph/hive/HiveVertexCreator.java 
> PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/giraph/hive/HiveVertexInputFormat.java 
> PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/giraph/hive/HiveVertexReader.java 
> PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/giraph/hive/VertexCreator.java 
> PRE-CREATION 
>   
> giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveApiInputFormat.java 
> PRE-CREATION 
>   
> giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveApiOutputFormat.java 
> PRE-CREATION 
>   
> giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveInputDescription.java
>  PRE-CREATION 
>   
> giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveOutputDescription.java
>  PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/hadoop/hive/api/HiveRecord.java 
> PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/hadoop/hive/api/common/Classes.java 
> PRE-CREATION 
>   
> giraph-hive/src/main/java/org/apache/hadoop/hive/api/common/HadoopUtils.java 
> PRE-CREATION 
>   
> giraph-hive/src/main/java/org/apache/hadoop/hive/api/common/ProgressReporter.java
>  PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/hadoop/hive/api/common/SerDes.java 
> PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/hadoop/hive/api/common/Writables.java 
> PRE-CREATION 
>   
> giraph-hive/src/main/java/org/apache/hadoop/hive/api/input/ApiInputSplit.java 
> PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/hadoop/hive/api/input/ApiRecord.java 
> PRE-CREATION 
>   
> giraph-hive/src/main/java/org/apache/hadoop/hive/api/input/ApiRecordReader.java
>  PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/hadoop/hive/api/input/InputConf.java 
> PRE-CREATION 
>   
> giraph-hive/src/main/java/org/apache/hadoop/hive/api/input/InputPartition.java
>  PRE-CREATION 
>   
> giraph-hive/src/main/java/org/apache/hadoop/hive/api/input/InputPartitions.java
>  PRE-CREATION 
>   
> giraph-hive/src/main/java/org/apache/hadoop/hive/api/input/InputSplitData.java
>  PRE-CREATION 
>   
> giraph-hive/src/main/java/org/apache/hadoop/hive/api/inspect/ListParser.java 
> PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/hadoop/hive/api/inspect/MapParser.java 
> PRE-CREATION 
>   
> giraph-hive/src/main/java/org/apache/hadoop/hive/api/inspect/ObjectParser.java
>  PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/hadoop/hive/api/inspect/Parsers.java 
> PRE-CREATION 
>   
> giraph-hive/src/main/java/org/apache/hadoop/hive/api/inspect/PrimitiveParser.java
>  PRE-CREATION 
>   
> giraph-hive/src/main/java/org/apache/hadoop/hive/api/inspect/StructParser.java
>  PRE-CREATION 
>   
> giraph-hive/src/main/java/org/apache/hadoop/hive/api/inspect/UnionParser.java 
> PRE-CREATION 
>   pom.xml 47736351c45f4cdb55df762506069eaed7126a7e 
> 
> Diff: https://reviews.apache.org/r/8611/diff/
> 
> 
> Testing
> -------
> 
> Ran on some production jobs and verified results were exactly the same.
> 
> In terms of performance this is on par with our current HCatalog stuff. I ran 
> a few jobs and noticed at most a few seconds of difference between the input 
> supersteps. Sometimes it was less, so I think the difference is mostly noise.
> 
> 
> Thanks,
> 
> Nitay Joffe
> 
>

Reply via email to