hi Andrzej:
I tried your new code;
1) Since "Page page = value.getPage();" is defined
within while loop, page instance can't be accessed
afterwards---causing failure for the next couple of
lines "page.setNextFetchTime.."
So, I define "Page page = value.getPage();" before
while loop
Will that change be OK with you?
2) "forceRefetch" can't be seen in FetchListTool
package, I just replace it with "true" to let compiler
go through,
any suggestions?
thanks,
Michael,
--- Andrzej Bialecki <[EMAIL PROTECTED]> wrote:
> Please try to replace this logic with the following:
>
> FetchListEntry value = new
> FetchListEntry();
> while (topN > 0 && reader.next(key,value)) {
> Page page = value.getPage();
> if (page != null) {
> Page p = new Page();
> p.set(page);
> page = p;
> }
> if (forceRefetch) {
> Page p = value.getPage();
> // reset fetchTime and MD5,
> so that the content will
> // always be new and unique.
> p.setNextFetchTime(0L);
>
> p.setMD5(MD5Hash.digest(p.getURL().toString()));
> }
> tables.append(value);
> topN--;
>
>
> This patchset still needs a lot of thought and work.
> Even the part that
> avoids re-fetching unmodified content needs
> additional thinking - it's
> easy to end up in a state, where Nutch cannot be
> forced to re-fetch the
> page because every time you try it remains
> unmodified - but you need
> refetching the actual data because e.g. you lost
> that segment data...
>
> --
> Best regards,
> Andrzej Bialecki <><
> ___. ___ ___ ___ _ _
> __________________________________
> [__ || __|__/|__||\/| Information Retrieval,
> Semantic Web
> ___|||__|| \| || | Embedded Unix, System
> Integration
> http://www.sigram.com Contact: info at sigram dot
> com
>
>
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com