[ 
https://issues.apache.org/jira/browse/HDFS-1448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12931182#action_12931182
 ] 

Jakob Homan commented on HDFS-1448:
-----------------------------------

Patch review for HDFS-1448-0.22-2.patch.

* BinaryTokenizer.java
** Providing a constructor that takes a stream rather than a String could aid 
in testing (my goal is for all testing to be able to be done via streams 
without going to the file system).
* EditsLoader.java
** printStatistics - does this method add value that couldn't otherwise be 
achieved as a separate viewer? I'm still not sold on providing this information 
for every run.  The vast majority of oev instances won't be interested in it, 
but will still have to pay the penalty of compiling the statistics. Conversely, 
the information could be of interest specifically (ie, tell me about this edits 
log), and then the user will have to run some other viewer just get it.  This 
same information can be gathered as a separate visitor, as mentioned in the 
first review.
* EditsLoaderCurrent.java
** Comments on 157-160 should be moved one line down so they apply to the check 
they're describing.
** The prior review had asked for the various switch statements to be moved 
into separate functions to aid in readability, testing and code maintability.  
Does the new code, with its individual functors and the extra code necessary to 
implement this scheme provide any functionality not given by the original 
suggestion?  If not, it seems to be a large amount of extra code and wiring 
without any benefit.  The goal of the comment in the original review was to 
reduce complexity and improve readability, which I'm not sure this new approach 
accomplishes.
* EditsOpCode.java
** The content of this file duplicates the constants created in FSEditLog.java. 
 While it would be best to avoid all duplication, that may not be possible in 
this patch, as discussed above.  However, to minimize duplication, I suggest 
refactoring out the constants from FSEditsLog into a separate class and 
referring to those constants in the enum definition.  Bonus points would be to 
have the Enum in the same file as the constants.
EditsVisitor.java
** What's the use case for the getTokenizer() method? It doesn't seem to be 
called anywhere.  Unused methods should be removed.
* EditsVisitorFactory.java
** Does the three lines of regular expression parsing necessary to determine if 
a file ends in .xml provide any extra benefit than simply using 
filename.endWith(".xml"), as was proposed in the original review?  If not, we 
should prefer the shorter, simpler code.
** It may be good to support .XML, .Xml, etc., and therefore call toLower on 
the string before checking for the file extension.
** Typo: "different implementatios"
* OfflineEditsViewer.java
** The only method calling public setEditsLoader() is OfflineEditsViewer is 
go().  As such, it should be made private.
* Tokenizer.java
** Spacing between fields and methods does not follow our coding standard.
** Consensus on naming convention for tokens of Foo appears to be FooToken, 
rather than TokenFoo (see: 
[http://www.google.com/codesearch?hl=en&sa=N&q=file:^.*Token*.java]), as well 
as our own 'Token'y classes.  We should follow that here. 
* TokenizerFactory.java
** Same comments for as for EditsFactory.java
* XmlTokenizer.java
** A quick survey of our exception handling shows that it is preferable to nest 
exceptions rather than taking the message from one, swallowing it and throwing 
a new exception: http://dl.dropbox.com/u/565949/exceptions.txt  We should do 
the same here.
** Do we need to handle all the empty cases in the switch? At the very least, 
there are a lot of empty comments that should be returned.
** {{public Token read(Token t) throws IOException {}} this method returns the 
same Token that it accepts, which has a bit of code smell.  I wonder if there's 
a way to avoid the confusion of mutating the parameter and then returning it, 
or, if not, explicitly documenting this behavior.
OfflineEditsViewerHelper.java
** As an aside: Whatever form elements of the edits file eventually take, it 
would be nice if they were self-testing and could provide this information 
automatically, rather than needing to call each one here, decoupled from the 
implementation.
** As noted in the first review, it is unnecessary and inefficient to shell out 
to copy the edits file.  This file is used a source for the test; you can find 
where it is (as is done in order to accomplish the copying) and explicitly 
clean it up after the test has completed.
I suggest refactoring
  {noformat}public void generateEdits(String dfsDir, String 
editsFilename){noformat}
to 
  {noformat}public String generateEdits(String dfsDir) // return path to 
edits{noformat}
and providing a shutdown method on OfflineEditsViewer that cleans up the 
cluster when the unit test has completed.
** 137: No need for IOException to be on a separate line
** Nit: 208: Void type is preferable to Object for doAs blocks
** 218: Same comment about nesting exceptions (as well as anywhere else I may 
have missed).
* TestOfflineEditsViewer.java
** Remove commented-out lines 27,28
** Nit: Line 174, save a few characters with a while loop
** 134: Exception should be converted to an assert since this is a unit test

> Create multi-format parser for edits logs file, support binary and XML 
> formats initially
> ----------------------------------------------------------------------------------------
>
>                 Key: HDFS-1448
>                 URL: https://issues.apache.org/jira/browse/HDFS-1448
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: tools
>    Affects Versions: 0.22.0
>            Reporter: Erik Steffl
>            Assignee: Erik Steffl
>             Fix For: 0.22.0
>
>         Attachments: editsStored, HDFS-1448-0.22-1.patch, 
> HDFS-1448-0.22-2.patch, HDFS-1448-0.22.patch, Viewer hierarchy.pdf
>
>
> Create multi-format parser for edits logs file, support binary and XML 
> formats initially.
> Parsing should work from any supported format to any other supported format 
> (e.g. from binary to XML and from XML to binary).
> The binary format is the format used by FSEditLog class to read/write edits 
> file.
> Primary reason to develop this tool is to help with troubleshooting, the 
> binary format is hard to read and edit (for human troubleshooters).
> Longer term it could be used to clean up and minimize parsers for fsimage and 
> edits files. Edits parser OfflineEditsViewer is written in a very similar 
> fashion to OfflineImageViewer. Next step would be to merge OfflineImageViewer 
> and OfflineEditsViewer and use the result in both FSImage and FSEditLog. This 
> is subject to change, specifically depending on adoption of avro (which would 
> completely change how objects are serialized as well as provide ways to 
> convert files to different formats).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to