The file src/java/org/apache/nutch/fetcher/Fetcher.java has the
following lines
-------------------------------------------------------------------
260 if (status.isSuccess()) {
261 outputPage(new FetcherOutput(fle, hash, protocolStatus),
262 content, new ParseText(parse.getText()),
parse.getData());
263 }
-------------------------------------------------------------------
where hash is
-------------------------------------------------------------------
233 Content content = output.getContent();
234 MD5Hash hash = null;
235 String url = fle.getPage().getURL().toString();
236 if (content == null) {
237 content = new Content(url, url, new byte[0], "", new
Properties());
238 hash = MD5Hash.digest(url);
239 } else {
240 hash = MD5Hash.digest(content.getContent());
241 }
-------------------------------------------------------------------
Its a little late right now and perhaps I'm asking a nieve questions, if
the parse is successful on non-null content, what would be the
by-product of changing the content hash from
hash = MD5Hash.digest(content.getContent());
to the hash being the MD5Digest of parse.getText().
Thoughts?