[
https://issues.apache.org/jira/browse/NUTCH-2393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16045415#comment-16045415
]
Sebastian Nagel commented on NUTCH-2393:
----------------------------------------
Thanks [~kaidul], for taking care of 2.x!
I don't know about the gory details of Gora. In case, the array backing up the
ByteBuffer returned by page.getContent() contains more than a single page
content: wouldn't it be better to check instead of {{buf.array().length == 0}}
for {{buf.remaining() == 0}} which (var {{cb}}) is passed as {{len}}gth
parameter to the digest method:
{code}
ByteBuffer buf = page.getContent();
...
if (buf == null || buf.remaining() == 0) {
// use URL instead of empty content
} else {
data = buf.array();
of = buf.arrayOffset() + buf.position();
cb = buf.remaining();
}
return MD5Hash.digest(data, of, cb).getDigest();
{code}
> 2.x patch for MD5 duplication issue addressed in NUTCH-2391
> -----------------------------------------------------------
>
> Key: NUTCH-2393
> URL: https://issues.apache.org/jira/browse/NUTCH-2393
> Project: Nutch
> Issue Type: Bug
> Components: commoncrawl
> Affects Versions: 2.3.1
> Reporter: Kaidul Islam
> Assignee: Kaidul Islam
> Priority: Minor
> Fix For: 2.4
>
> Original Estimate: 24h
> Remaining Estimate: 24h
>
> Equivalent patch for 2.x for issue addressed in NUTCH-2391
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)