> On Feb. 21, 2013, 6:45 p.m., Maja Kabiljo wrote:
> > This is a lot of great work, Nitay, thanks! I really like that user doesn't 
> > have to extend the whole Input/Output format anymore, that was a lot of 
> > code duplication every time.
> > 
> > Is it possible to provide some examples/tests for this?

Opened https://issues.apache.org/jira/browse/GIRAPH-534 so that we create 
examples / tests.


> On Feb. 21, 2013, 6:45 p.m., Maja Kabiljo wrote:
> > giraph-hive/src/main/java/org/apache/giraph/hive/common/HiveProfiles.java, 
> > lines 29-31
> > <https://reviews.apache.org/r/8611/diff/6/?file=260732#file260732line29>
> >
> >     What is this for? (on some other places too)

It is to allow multiple tables at same time. Basically to do it you need to 
have some namespacing for Configuration keys, so these profiles are my way of 
doing it. I have a cleaner solution in mind that I will put in another diff 
which should clean up some of these.


> On Feb. 21, 2013, 6:45 p.m., Maja Kabiljo wrote:
> > giraph-hive/src/main/java/org/apache/giraph/hive/common/HiveProfiles.java, 
> > lines 35-36
> > <https://reviews.apache.org/r/8611/diff/6/?file=260732#file260732line35>
> >
> >     Out of curiosity - why do we do this? (why isn't it private)

Sometimes I want to allow inheritance but in this case no need, private it is.


> On Feb. 21, 2013, 6:45 p.m., Maja Kabiljo wrote:
> > giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/HiveEdgeReader.java,
> >  line 154
> > <https://reviews.apache.org/r/8611/diff/6/?file=260735#file260735line154>
> >
> >     Could we have an option to reuse edge objects here?

Good call


- Nitay


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8611/#review16867
-----------------------------------------------------------


On Feb. 21, 2013, 6:17 p.m., Nitay Joffe wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/8611/
> -----------------------------------------------------------
> 
> (Updated Feb. 21, 2013, 6:17 p.m.)
> 
> 
> Review request for giraph.
> 
> 
> Description
> -------
> 
> One particular thing I added was the concept of "profiles", allowing for 
> easily reading / writing from multiple tables. This should remove a lot of 
> the cruft around the GiraphHCat* classes.
> 
> Note in the diff I separated the code so that there would be a 
> Giraph-unrelated Hive-only portion (under package org.apache.hadoop.hive). 
> Things under this package (and its children) do not touch any Giraph code, 
> and so can be contributed as an IOFormat back to Hive itself.
> 
> Also note the new (I think improved) interface: Users do not need to actually 
> implement an XInputFormat anymore. They just create a class the implements 
> the HiveToVertex (HiveToEdge, VertexToHive) interface, plug that in, and use 
> HiveVertexInputFormat. Should make user code much cleaner.
> 
> 
> This addresses bug GIRAPH-453.
>     https://issues.apache.org/jira/browse/GIRAPH-453
> 
> 
> Diffs
> -----
> 
>   giraph-accumulo/pom.xml cb9fbc02e6fc8adcb0ec41e0c6aeff75b1ef3f06 
>   giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyClient.java 
> 89ef87fea7a370354156fb7be02ef4249e0a6111 
>   giraph-core/src/main/java/org/apache/giraph/conf/GiraphConfiguration.java 
> ddeaeb769b548eb1002ccf8c18ffe048eb096f8d 
>   giraph-hbase/pom.xml 7bbbd98c0b3db6878aee4be21eecd821448da7ef 
>   giraph-hcatalog/pom.xml 019f02083012704a997ffe715cefe3adeb153dd9 
>   
> giraph-hcatalog/src/main/java/org/apache/giraph/io/hcatalog/HCatGiraphRunner.java
>  PRE-CREATION 
>   
> giraph-hcatalog/src/main/java/org/apache/giraph/io/hcatalog/HiveGiraphRunner.java
>  313bab04c50ed6be7143254de80e36a4ba291516 
>   giraph-hcatalog/src/main/java/org/apache/giraph/io/hcatalog/HiveUtils.java 
> c1f76f1a46d1fc9af489a916256884520c138cb4 
>   giraph-hive/pom.xml PRE-CREATION 
>   giraph-hive/src/main/assembly/compile.xml PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/giraph/hive/HiveGiraphRunner.java 
> PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/giraph/hive/common/HiveProfiles.java 
> PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/giraph/hive/common/package-info.java 
> PRE-CREATION 
>   
> giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/HiveEdgeInputFormat.java
>  PRE-CREATION 
>   
> giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/HiveEdgeReader.java
>  PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/HiveToEdge.java 
> PRE-CREATION 
>   
> giraph-hive/src/main/java/org/apache/giraph/hive/input/edge/package-info.java 
> PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/giraph/hive/input/package-info.java 
> PRE-CREATION 
>   
> giraph-hive/src/main/java/org/apache/giraph/hive/input/vertex/HiveToVertex.java
>  PRE-CREATION 
>   
> giraph-hive/src/main/java/org/apache/giraph/hive/input/vertex/HiveVertexInputFormat.java
>  PRE-CREATION 
>   
> giraph-hive/src/main/java/org/apache/giraph/hive/input/vertex/HiveVertexReader.java
>  PRE-CREATION 
>   
> giraph-hive/src/main/java/org/apache/giraph/hive/input/vertex/package-info.java
>  PRE-CREATION 
>   
> giraph-hive/src/main/java/org/apache/giraph/hive/output/HiveVertexOutputFormat.java
>  PRE-CREATION 
>   
> giraph-hive/src/main/java/org/apache/giraph/hive/output/HiveVertexWriter.java 
> PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/giraph/hive/output/VertexToHive.java 
> PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/giraph/hive/output/package-info.java 
> PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/giraph/hive/package-info.java 
> PRE-CREATION 
>   pom.xml c075762cddd7a698c92aaad4017cd74915160e41 
> 
> Diff: https://reviews.apache.org/r/8611/diff/
> 
> 
> Testing
> -------
> 
> Ran on some production jobs and verified results were exactly the same.
> 
> Here's a comparison of performance on real work loads ("base" is hcatalog, 
> "mine" is hive):
> https://gist.github.com/nitay/880d8fb20d2ac86015d4/raw/6b297fcb287bf8d3dc8175bad217aa86544b4f18/high+school
> 
> Basically we see slight improvement which is expected because I haven't done 
> a lot in terms of performance yet.
> There are few performance improvement ideas coming, this is just the first 
> working version.
> 
> 
> Thanks,
> 
> Nitay Joffe
> 
>

Reply via email to