-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6402/#review10101
-----------------------------------------------------------


Hey Jaeho, great work here. This was a lot and could not have been easy to wade 
through. I love your attention to detail and fantastic comments that can lead 
the way for others to figure out how to put together IO formats. The small 
things I noticed were:

- in AdjacencyListVertexOutputFormat, maybe call writeLine() writeVertex() like 
it was, since it takes a vertex and is writing at the unit of one vertex per 
line. The other formats consume input to write at this same granularity and it 
keeps the processing units consistent throughout the code. Its just a detail 
that we are writing to text lines in this family of OutputFormats, many folks 
here will be using other formats for IO that do not involve text (including 
myself most of the time.) You might rename the enclosing class to indicate 
lines and not just text or vertices are being written out here? Or just add 
some nice javadoc comments to make it clear to users what is going to happen 
here.

- in TextVertexInputFormat some comments mix web-escaped sequences of 
greater-than and less-than chars with the real angle-brackets.

- in VertexReaderFromEachLine, the getId(), getValue(), and getEdge() are a 
clean solution, but in their Text input they force the user to reparse the same 
line over and over. One pass for parsing is nice and easier not to mess up so 
I'm not sure this saved the users any time. In the version that is preprocessed 
that takes a generic type, you could pass a list of tokens, but then the user 
must be sure the number of tokens read is consistent with what they found on 
the line or those methods will trusting the sequence in the token list. This is 
not bad, but again it doesn't seem any easier for a user to debug and for the 
parsing of single lines when the user is aware of the assumed formatting and 
can work with tokens that don't make sense to the parser "in context" of the 
line. consuming each line in one place as a stream is familiar to programmers 
in general, and they can break the work into methods as they see fit. The place 
where this makes the code cleaner is the vertex creation piece that puts all 
the parsed values together, but you've already written that for them, so it 
doesn't really give the user less code to write. In the end, many users will 
and do use non-text IO so I think just having access to the line information to 
parse might be enough here.

In some cases, while reading the old and new code in the diff, I got the 
feeling a mix of both would be the best and tightest solution, mixed with a lot 
better commenting of the sort you put into you new code to light the way for 
them.  How can we tweak this code to focus on making the unfamiliar 
Giraph/Hadoop boilerplate go away without adding too many components for a new 
user to remember to compose and wire up? I think the solution that would make 
me do backflips would be the one that solves that puzzle, and I feel like 
you're already most of the way there with this patch. 

I would be curious, have you run this code while mixing and matching these IO 
formats and some vertices, and perhaps mixing in the wrong one sometimes to see 
if it breaks as it should? I would like to verify that shifting some of the 
generic syntax around does not confuse ReflectionUtils at any point in the 
game, this is why we have so many generic defs all over the code right now, and 
its delicate. the utils have to successfully walk the inheritance chain all the 
way to where the parameters are concretely set without missing a link in the 
chain or they loose the type info. I think we're probably in the clear here, 
but I'd like to try a dry run just to see what happens if you haven't already.

On the whole, I like this. It adds more clarity than it takes away. I think we 
could go a step simpler here and it would be a win, but again thats just me, 
and with this in place maybe the additional simplification would be the work of 
someone else in another JIRA.


- Eli Reisman


On Aug. 6, 2012, 9:21 p.m., Jaeho Shin wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/6402/
> -----------------------------------------------------------
> 
> (Updated Aug. 6, 2012, 9:21 p.m.)
> 
> 
> Review request for giraph.
> 
> 
> Description
> -------
> 
> (copied from GIRAPH-277)
> 
> The current way of implementing VertexInputFormat and VertexReader had bad 
> smell. It required users to understand how these two classes are glued 
> together, and forced similar codes to be duplicated in every new input 
> format. (Similarly for the VertexOutputFormat and VertexWriter.) Anyone who 
> wants to create a new format should create an underlying record reader or 
> writer at the right moment and delegate some calls to it, which seemed 
> unnecessary detail being exposed. Besides, type parameters had to appear all 
> over every new format code, which was extremely annoying for both reading 
> existing code and writing a new one. I was very frustrated writing my first 
> format code especially when I compared it to writing a new vertex code. I 
> thought writing a new input/output format should be as simple as vertex.
> So, I have refactored TextVertexInputFormat and OutputFormat into new forms 
> that have no difference in their interfaces, but remove a lot of burden for 
> subclassing. Instead of providing static VertexReader base classes, I made it 
> a non-static inner-class of its format class, which helps eliminate the 
> repeated code for gluing these two, already tightly coupled classes. This has 
> additional advantage of eliminating all the Generics type variables on the 
> VertexReader side, which makes overall code much more concise. I added 
> several useful TextVertexReader base classes that can save efforts for 
> implementing line-oriented formats.
> 
> 
> This addresses bug GIRAPH-277.
>     https://issues.apache.org/jira/browse/GIRAPH-277
> 
> 
> Diffs
> -----
> 
>   trunk/checkstyle.xml 1369914 
>   
> trunk/src/main/java/org/apache/giraph/examples/IntIntNullIntTextInputFormat.java
>  1369914 
>   trunk/src/main/java/org/apache/giraph/examples/SimplePageRankVertex.java 
> 1369914 
>   trunk/src/main/java/org/apache/giraph/examples/SimpleSuperstepVertex.java 
> 1369914 
>   
> trunk/src/main/java/org/apache/giraph/examples/SimpleTextVertexOutputFormat.java
>  1369914 
>   
> trunk/src/main/java/org/apache/giraph/examples/VertexWithComponentTextOutputFormat.java
>  1369914 
>   trunk/src/main/java/org/apache/giraph/graph/VertexWriter.java 1369914 
>   
> trunk/src/main/java/org/apache/giraph/lib/AdjacencyListTextVertexInputFormat.java
>  PRE-CREATION 
>   
> trunk/src/main/java/org/apache/giraph/lib/AdjacencyListTextVertexOutputFormat.java
>  1369914 
>   trunk/src/main/java/org/apache/giraph/lib/AdjacencyListVertexReader.java 
> 1369914 
>   trunk/src/main/java/org/apache/giraph/lib/IdWithValueTextOutputFormat.java 
> 1369914 
>   trunk/src/main/java/org/apache/giraph/lib/JsonBase64VertexInputFormat.java 
> 1369914 
>   trunk/src/main/java/org/apache/giraph/lib/JsonBase64VertexOutputFormat.java 
> 1369914 
>   
> trunk/src/main/java/org/apache/giraph/lib/JsonLongDoubleFloatDoubleVertexInputFormat.java
>  1369914 
>   
> trunk/src/main/java/org/apache/giraph/lib/JsonLongDoubleFloatDoubleVertexOutputFormat.java
>  1369914 
>   
> trunk/src/main/java/org/apache/giraph/lib/LongDoubleDoubleAdjacencyListVertexInputFormat.java
>  1369914 
>   
> trunk/src/main/java/org/apache/giraph/lib/TextDoubleDoubleAdjacencyListVertexInputFormat.java
>  1369914 
>   trunk/src/main/java/org/apache/giraph/lib/TextVertexInputFormat.java 
> 1369914 
>   trunk/src/main/java/org/apache/giraph/lib/TextVertexOutputFormat.java 
> 1369914 
>   
> trunk/src/test/java/org/apache/giraph/lib/TestAdjacencyListTextVertexOutputFormat.java
>  1369914 
>   
> trunk/src/test/java/org/apache/giraph/lib/TestIdWithValueTextOutputFormat.java
>  1369914 
>   
> trunk/src/test/java/org/apache/giraph/lib/TestLongDoubleDoubleAdjacencyListVertexInputFormat.java
>  1369914 
>   
> trunk/src/test/java/org/apache/giraph/lib/TestTextDoubleDoubleAdjacencyListVertexInputFormat.java
>  1369914 
> 
> Diff: https://reviews.apache.org/r/6402/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Jaeho Shin
> 
>

Reply via email to