-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9119/
-----------------------------------------------------------
Review request for nutch and Julien Le Dem.
Description
-------
Will contain the patch the SegmentContentDumperTool described in NUTCH-1526:
./bin/nutch org.apache.nutch.tools.SegmentContentDumper [options]
-segmentRootDir full file path to the root segment directory, e.g.,
crawl/segments
-regexUrlPattern a regex URL pattern to select URL keys to dump from the
content DB in each segment
-outputDir The output directory to write file names to.
-metadata --key=value where key is a Content Metadata key and value is a
value to check.
This addresses bug NUTCH-1526.
https://issues.apache.org/jira/browse/NUTCH-1526
Diffs
-----
Diff: https://reviews.apache.org/r/9119/diff/
Testing
-------
Testing it on DARPA XDATA XNET.
Thanks,
Chris Mattmann