In the constructor of your URLFilter, why not consider passing
in a NutchConfiguration object, and then reading the path to e.g,
the LinkDb from the config. Then have a private member variable
for the LinkDbReader (maybe static initialized for efficiency)
and use that in your interface method.

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: [email protected]
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++






-----Original Message-----
From: Renxia Wang <[email protected]>
Reply-To: "[email protected]" <[email protected]>
Date: Sunday, February 22, 2015 at 3:36 PM
To: "[email protected]" <[email protected]>
Subject: How to read metadata/content of an URL in URLFilter?

>
>
>
>Hi 
>
>
>I want to develop an UrlFIlter which takes an url, takes its metadata or
>even the fetched content, then use some duplicate detection algorithms to
>determine if it is a duplicate of any url in bitch. However, the only
>parameter passed into the Urlfilter
> is the url, is it possible to get the data I want of that input url in
>Urlfilter? 
>
>
>Thanks, 
>
>
>Zhique

Reply via email to