hi Andrzej:

I tried your new code;

1) Since "Page page = value.getPage();" is defined
within while loop, page instance can't be accessed
afterwards---causing failure for the next couple of
lines "page.setNextFetchTime.."

So, I define "Page page = value.getPage();" before
while loop

Will that change be OK with you?

2) "forceRefetch" can't be seen in FetchListTool
package, I just replace it with "true" to let compiler
go through,

any suggestions?

thanks,

Michael,

--- Andrzej Bialecki <[EMAIL PROTECTED]> wrote:


 
> Please try to replace this logic with the following:
> 
> FetchListEntry value = new
> FetchListEntry();
> while (topN > 0 && reader.next(key,value)) {
>                    Page page = value.getPage();
>                    if (page != null) {
>                      Page p = new Page();
>                      p.set(page);
>                      page = p;
>                    }
>                      if (forceRefetch) {
>                        Page p = value.getPage();
>                        // reset fetchTime and MD5,
> so that the content will
>                        // always be new and unique.
>                        p.setNextFetchTime(0L);
>                       
> p.setMD5(MD5Hash.digest(p.getURL().toString()));
>                      }
>                      tables.append(value);
>                      topN--;
> 
> 
> This patchset still needs a lot of thought and work.
> Even the part that 
> avoids re-fetching unmodified content needs
> additional thinking - it's 
> easy to end up in a state, where Nutch cannot be
> forced to re-fetch the 
> page because every time you try it remains
> unmodified - but you need 
> refetching the actual data because e.g. you lost
> that segment data...
> 
> -- 
> Best regards,
> Andrzej Bialecki     <><
>   ___. ___ ___ ___ _ _  
> __________________________________
> [__ || __|__/|__||\/|  Information Retrieval,
> Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System
> Integration
> http://www.sigram.com  Contact: info at sigram dot
> com
> 
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

Reply via email to