In the constructor of your URLFilter, why not consider passing in a NutchConfiguration object, and then reading the path to e.g, the LinkDb from the config. Then have a private member variable for the LinkDbReader (maybe static initialized for efficiency) and use that in your interface method.
Cheers, Chris ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: [email protected] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ -----Original Message----- From: Renxia Wang <[email protected]> Reply-To: "[email protected]" <[email protected]> Date: Sunday, February 22, 2015 at 3:36 PM To: "[email protected]" <[email protected]> Subject: How to read metadata/content of an URL in URLFilter? > > > >Hi > > >I want to develop an UrlFIlter which takes an url, takes its metadata or >even the fetched content, then use some duplicate detection algorithms to >determine if it is a duplicate of any url in bitch. However, the only >parameter passed into the Urlfilter > is the url, is it possible to get the data I want of that input url in >Urlfilter? > > >Thanks, > > >Zhique

