Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The following page has been changed by RobPettengill: http://wiki.apache.org/nutch/bin/nutch_segslice New page: segslice is an alias for net.nutch.segment.!SegmentSlicer This class reads data from one or more input segments, and outputs it to one or more output segments, optionally deleting the input segments when it's finished. Data is read sequentially from input segments, and appended to output segment until it reaches the target count of entries, at which point the next output segment is created, and so on. NOTE 1: this tool does NOT de-duplicate data - use SegmentMergeTool for that. NOTE 2: this tool does NOT copy indexes. It is currently impossible to slice Lucene indexes. The proper procedure is first to create slices, and then to index them. NOTE 3: if one or more input segments are in non-parsed format, the output segments will also use non-parsed format. This means that any parseData and parseText data from input segments will NOT be copied to the output segments. Usage: bin/nutch net.nutch.segment.!SegmentSlicer (-local | -ndfs <namenode:port>) -o outputDir [-max count] [-fix] [-nocontent] [-noparsedata] [-noparsetext] (-dir segments | seg1 seg2 ...)[[BR]] NOTE: at least one segment dir name is required, or '-dir' option. outputDir is always required.[[BR]] -o outputDir[[BR]] output directory for segments[[BR]] -max count[[BR]] (optional) output multiple segments, each with maximum 'count' entries[[BR]] -fix[[BR]] (optional) automatically fix corrupted segments[[BR]] -nocontent[[BR]] (optional) ignore content data[[BR]] -noparsedata[[BR]] (optional) ignore parse_data data[[BR]] -nocontent[[BR]] (optional) ignore parse_text data[[BR]] -dir segments[[BR]] directory containing multiple segments[[BR]] seg1 seg2 ...[[BR]] segment directories[[BR]] [CommandLineOptions]
