Hi Kelvin:

I tried to implement controlled depth crawling based
on your Nutch-84 and the discussion we had before.

1. In DepthFLFilter Class, 

I did a bit modification
"
public synchronized int filter(FetchListScope.Input
input) {
    input.parent.decrementDepth();
    return input.parent.depth >= 0 ? ALLOW : REJECT;
  }
"

2 In ScheduledURL Class
add one member variable and one member function
"
public int depth;

public void decrementDepth() {
    depth --;
  }
"

3 Then

we need an initial depth for each domain; for the
initial testing; I can set a default value 5 for all
the site in seeds.txt and for each outlink, the value
will be 1;

In that way, a pretty vertical crawling is done for
on-site domain while outlink homepage is still
visible;

Further more, should we define a depth value for each
url in seeds.txt? 

Did I in the right track?

Thanks,

Michael Ji


                
__________________________________ 
Yahoo! Mail 
Stay connected, organized, and protected. Take the tour: 
http://tour.mail.yahoo.com/mailtour.html 

Reply via email to