Kim Whitehall created NUTCH-2100:
------------------------------------
Summary: Nutch dump command doesnt dump anything
Key: NUTCH-2100
URL: https://issues.apache.org/jira/browse/NUTCH-2100
Project: Nutch
Issue Type: Bug
Reporter: Kim Whitehall
When running the cmd
nutch dump -segment segment -outputDir dumpFolder -mimeStats
I receive the following
Dumper File Stats:
TOTAL Stats:
[
]
The log indicates that segments are being skipped.
Note, if I use nutch/readseg -dump I can see there is content there.
The log is shown below:
2015-09-15 20:10:56,142 INFO tools.FileDumper - Accepting all mimetypes.
2015-09-15 20:10:56,782 WARN util.NativeCodeLoader - Unable to load
native-hadoop library for your platform... using builtin-java classes where
applicable
2015-09-15 20:10:57,057 INFO tools.FileDumper - Processing segment:
[/.../segments/20150915195411/crawl_generate]
2015-09-15 20:10:57,057 WARN tools.FileDumper - Skipping segment:
[/.../segments/20150915195411/crawl_generate/content/part-00000/data]: no data
directory present
2015-09-15 20:10:57,057 INFO tools.FileDumper - Processing segment:
[/.../segments/20150915195411/crawl_fetch]
2015-09-15 20:10:57,057 WARN tools.FileDumper - Skipping segment:
[/.../segments/20150915195411/crawl_fetch/content/part-00000/data]: no data
directory present
2015-09-15 20:10:57,058 INFO tools.FileDumper - Processing segment:
[/.../segments/20150915195411/content]
2015-09-15 20:10:57,058 WARN tools.FileDumper - Skipping segment:
[/.../segments/20150915195411/content/content/part-00000/data]: no data
directory present
2015-09-15 20:10:57,058 INFO tools.FileDumper - Processing segment:
[/.../segments/20150915195411/parse_text]
2015-09-15 20:10:57,058 WARN tools.FileDumper - Skipping segment:
[/.../segments/20150915195411/parse_text/content/part-00000/data]: no data
directory present
2015-09-15 20:10:57,058 INFO tools.FileDumper - Processing segment:
[/.../segments/20150915195411/parse_data]
2015-09-15 20:10:57,058 WARN tools.FileDumper - Skipping segment:
[/.../segments/20150915195411/parse_data/content/part-00000/data]: no data
directory present
2015-09-15 20:10:57,058 INFO tools.FileDumper - Processing segment:
[/.../segments/20150915195411/crawl_parse]
2015-09-15 20:10:57,058 WARN tools.FileDumper - Skipping segment:
[/.../segments/20150915195411/crawl_parse/content/part-00000/data]: no data
directory present
2015-09-15 20:10:57,059 INFO tools.FileDumper - Dumper File Stats:
TOTAL Stats:
[
]
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)