-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9119/#review52809
-----------------------------------------------------------


Not sure why I'm added to this review but I figured I could review it anyway :)


./trunk/src/java/org/apache/nutch/tools/FileDumper.java
<https://reviews.apache.org/r/9119/#comment91979>

    this should be in the scope of the main method.
    If you wanted to write unit tests it would be inconvenient as calling main 
more than once would cumulate the stats.



./trunk/src/java/org/apache/nutch/tools/FileDumper.java
<https://reviews.apache.org/r/9119/#comment91974>

    you might want to throw an exception if this returns false



./trunk/src/java/org/apache/nutch/tools/FileDumper.java
<https://reviews.apache.org/r/9119/#comment91975>

    as this all working from the local file system, using Files all the way and 
converting to path when needed seems more natural.
    new Path(file.toURI()) for example.



./trunk/src/java/org/apache/nutch/tools/FileDumper.java
<https://reviews.apache.org/r/9119/#comment91978>

    usually the fileSystem object needs to be retrieved from the path.
    path.getFIleSystem(conf)



./trunk/src/java/org/apache/nutch/tools/FileDumper.java
<https://reviews.apache.org/r/9119/#comment91976>

    does content close the stream?



./trunk/src/java/org/apache/nutch/tools/FileDumper.java
<https://reviews.apache.org/r/9119/#comment91977>

    we create new File(outputFullPath) twice.


- Julien Le Dem


On Sept. 6, 2014, 4:57 a.m., Chris Mattmann wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/9119/
> -----------------------------------------------------------
> 
> (Updated Sept. 6, 2014, 4:57 a.m.)
> 
> 
> Review request for nutch and Julien Le Dem.
> 
> 
> Bugs: NUTCH-1526
>     https://issues.apache.org/jira/browse/NUTCH-1526
> 
> 
> Repository: nutch
> 
> 
> Description
> -------
> 
> Will contain the patch the SegmentContentDumperTool described in NUTCH-1526:
> 
> ./bin/nutch org.apache.nutch.tools.SegmentContentDumper [options]
>    -segmentRootDir full file path to the root segment directory, e.g., 
> crawl/segments
>    -regexUrlPattern a regex URL pattern to select URL keys to dump from the 
> content DB in each segment
>    -outputDir The output directory to write file names to.
>    -metadata --key=value where key is a Content Metadata key and value is a 
> value to check.
> 
> 
> Diffs
> -----
> 
>   ./trunk/src/java/org/apache/nutch/tools/FileDumper.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/9119/diff/
> 
> 
> Testing
> -------
> 
> Testing it on DARPA XDATA XNET.
> 
> 
> Thanks,
> 
> Chris Mattmann
> 
>

Reply via email to