Hi,
I've written a parse-exe plugin for downloading EXE files from crawled
pages.
I've used the parse-pdf as my template.
Although the plugin works (d/l the exe with any content type related ,i.e.
application/(x-exe|x-msdos|x-dosexec..)), i still get nullPointerException
for parseData.
I don't fully understand the code in the end, and i might missed something,
can anyone help?
the getParse(Content content) i've written:
public Parse getParse(Content content) {
String resultText = "No textual content available";
String resultTitle = "No textual content available";
Outlink[] outlinks = new Outlink[0];
Metadata metadata = new Metadata();
try {
byte[] raw = content.getContent();
String contentLength = content.getMetadata().get(
Response.CONTENT_LENGTH);
if (contentLength != null && raw.length !=
Integer.parseInt(contentLength))
{
return new ParseStatus(ParseStatus.FAILED,
ParseStatus.FAILED_TRUNCATED,
"Content truncated at "+raw.length
+" bytes. Parser can't handle incomplete exe
file.").getEmptyParse(getConf());
}
// download the file - separate method (doesn't effect the other vars)
downloadContentType(content);
}catch (Exception e) { // run time exception
if (LOG.isWarnEnabled()) {
LOG.warn("General exception in EXE parser: "+e.getMessage());
e.printStackTrace(LogUtil.getWarnStream(LOG));
}
return new ParseStatus(ParseStatus.FAILED,
"Can't be handled as exe document. " +
e).getEmptyParse(getConf());
}
ParseData parseData = new ParseData(ParseStatus.STATUS_SUCCESS,
resultTitle, outlinks,
content.getMetadata());
return new ParseImpl(resultText, parseData);
}
when running i get this exception:
java.lang.NullPointerException
at org.apache.nutch.parse.ParseData.write(ParseData.java:163)
at org.apache.nutch.parse.ParseImpl.write(ParseImpl.java:55)
at org.apache.nutch.fetcher.FetcherOutput.write(FetcherOutput.java
:63)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(
MapTask.java:315)
at org.apache.nutch.fetcher.Fetcher$FetcherThread.output(
Fetcher.java:403)
at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java
:164)
fetch of http://www2.ati.com/misc/themes/ATI_ThemeManager_July2004.exefailed
with:
java.lang.NullPointerException
Thanks,
Eyal.
--
Eyal Edri