-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9119/#review52796
-----------------------------------------------------------



./trunk/src/java/org/apache/nutch/tools/FileDumper.java
<https://reviews.apache.org/r/9119/#comment91927>

    This should read 
    
    FileDumper <output directory> <segments dir>



./trunk/src/java/org/apache/nutch/tools/FileDumper.java
<https://reviews.apache.org/r/9119/#comment91928>

    If I invoke this tool without ANY arguments, I get the following
    
    lmcgibbn@LMC-032857 /usr/local/trunk/runtime/local(master) $ ./bin/nutch 
org.apache.nutch.tools.FileDumper
    2014-09-09 15:57:19.045 java[3866:1903] Unable to load realm info from 
SCDynamicStore
    Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 0
        at org.apache.nutch.tools.FileDumper.main(FileDumper.java:53)



./trunk/src/java/org/apache/nutch/tools/FileDumper.java
<https://reviews.apache.org/r/9119/#comment91929>

    When I invoke this tool as follows
    
    lmcgibbn@LMC-032857 /usr/local/trunk/runtime/local(master) $ ./bin/nutch 
org.apache.nutch.tools.FileDumper . 
/usr/local/trunk/src/testresources/testcrawl/segments/
    2014-09-09 15:59:06.185 java[3883:1903] Unable to load realm info from 
SCDynamicStore
    Sep 09, 2014 3:59:06 PM org.apache.nutch.tools.FileDumper main
    INFO: Processing segment: 
[/usr/local/trunk/src/testresources/testcrawl/segments/20060919213635]
    Exception in thread "main" java.io.IOException: wrong key class: 
org.apache.hadoop.io.Text is not class org.apache.hadoop.io.UTF8
        at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1886)
        at org.apache.nutch.tools.FileDumper.main(FileDumper.java:99)



./trunk/src/java/org/apache/nutch/tools/FileDumper.java
<https://reviews.apache.org/r/9119/#comment91931>

    When I change the Text() class to use the UTF8() class, I get the following
    
    lmcgibbn@LMC-032857 /usr/local/trunk/runtime/local(master) $ ./bin/nutch 
org.apache.nutch.tools.FileDumper . 
/usr/local/trunk/src/testresources/testcrawl/segments/
    2014-09-09 16:02:21.339 java[3942:1903] Unable to load realm info from 
SCDynamicStore
    Sep 09, 2014 4:02:21 PM org.apache.nutch.tools.FileDumper main
    INFO: Processing segment: 
[/usr/local/trunk/src/testresources/testcrawl/segments/20060919213635]
    Exception in thread "main" java.io.EOFException
        at java.io.DataInputStream.readFully(DataInputStream.java:197)
        at java.io.DataInputStream.readFully(DataInputStream.java:169)
        at 
org.apache.nutch.protocol.Content.readFieldsCompressed(Content.java:99)
        at org.apache.nutch.protocol.Content.readFields(Content.java:154)
        at 
org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:1813)
        at org.apache.nutch.tools.FileDumper.main(FileDumper.java:101)
        
    UTF8 is of course deprecated now so we need to stick with Text and 
implement the corect code.


- Lewis McGibbney


On Sept. 6, 2014, 4:57 a.m., Chris Mattmann wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/9119/
> -----------------------------------------------------------
> 
> (Updated Sept. 6, 2014, 4:57 a.m.)
> 
> 
> Review request for nutch and Julien Le Dem.
> 
> 
> Bugs: NUTCH-1526
>     https://issues.apache.org/jira/browse/NUTCH-1526
> 
> 
> Repository: nutch
> 
> 
> Description
> -------
> 
> Will contain the patch the SegmentContentDumperTool described in NUTCH-1526:
> 
> ./bin/nutch org.apache.nutch.tools.SegmentContentDumper [options]
>    -segmentRootDir full file path to the root segment directory, e.g., 
> crawl/segments
>    -regexUrlPattern a regex URL pattern to select URL keys to dump from the 
> content DB in each segment
>    -outputDir The output directory to write file names to.
>    -metadata --key=value where key is a Content Metadata key and value is a 
> value to check.
> 
> 
> Diffs
> -----
> 
>   ./trunk/src/java/org/apache/nutch/tools/FileDumper.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/9119/diff/
> 
> 
> Testing
> -------
> 
> Testing it on DARPA XDATA XNET.
> 
> 
> Thanks,
> 
> Chris Mattmann
> 
>

Reply via email to