----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/9119/ -----------------------------------------------------------
Review request for nutch and Julien Le Dem. Description ------- Will contain the patch the SegmentContentDumperTool described in NUTCH-1526: ./bin/nutch org.apache.nutch.tools.SegmentContentDumper [options] -segmentRootDir full file path to the root segment directory, e.g., crawl/segments -regexUrlPattern a regex URL pattern to select URL keys to dump from the content DB in each segment -outputDir The output directory to write file names to. -metadata --key=value where key is a Content Metadata key and value is a value to check. This addresses bug NUTCH-1526. https://issues.apache.org/jira/browse/NUTCH-1526 Diffs ----- Diff: https://reviews.apache.org/r/9119/diff/ Testing ------- Testing it on DARPA XDATA XNET. Thanks, Chris Mattmann