Re: Get all the URLs in Crawldb which has status db_fetched in Nutch 1.3

lewis john mcgibbney Mon, 24 Oct 2011 03:11:44 -0700

Hi Tri,

The status de_fetched means that these URLs have been fetched and exisit
within you crawldb. Is this what you are after?

The CrawlDatum class [1] displays all of the possible states that an URL can
exist in within your crawldb.

[1]
https://svn.apache.org/repos/asf/nutch/trunk/src/java/org/apache/nutch/crawl/CrawlDatum.java

On Mon, Oct 24, 2011 at 12:05 PM, Tri Nguyen <[email protected]> wrote:

> Dear Helpers,
>
>
> Could you please help me how we do this task in Nutch 1.3?
>
>
> Thank you so much for your help,
>
> Regards,
>
> Tri Nguyen.
>

-- 
*Lewis*

Re: Get all the URLs in Crawldb which has status db_fetched in Nutch 1.3

Reply via email to