Hi Gail, Check out:
http://wiki.apache.org/nutch/ParserFactoryImprovementProposal/ That's the way that the parser factory currently works. Also added, but not described in that proposal is the ability to call a parser by its id, which is a method present in ParseUtil.java. G'luck! Cheers, Chris ______________________________________________ Chris A. Mattmann [EMAIL PROTECTED] Staff Member Modeling and Data Management Systems Section (387) Data Management Systems and Technologies Group _________________________________________________ Jet Propulsion Laboratory Pasadena, CA Office: 171-266B Mailstop: 171-246 _______________________________________________________ Disclaimer: The opinions presented within are my own and do not reflect those of either NASA, JPL, or the California Institute of Technology. > -----Original Message----- > From: Gal Nitzan (JIRA) [mailto:[EMAIL PROTECTED] > Sent: Sunday, January 15, 2006 4:10 PM > To: [email protected] > Subject: [jira] Updated: (NUTCH-179) Proposition: Enable Nutch to use a > parser plugin not just based on content type > > [ http://issues.apache.org/jira/browse/NUTCH-179?page=all ] > > Gal Nitzan updated NUTCH-179: > ----------------------------- > > Description: > Sorry, please close this issue. > > I figured that if I set my parse plugin first. I can always be called > first and than decide if I want to parse or not. > > was: > Somtime there are requirements of the "real world" (usually your boss) > where a special parse is required for a certain site. Though the content > type is text/html, a specialized parser is needed. > > Sample: I am required to crawl certain sites where some of them are > partners sites. when fetching from the partners site I need to look for > certain entries in the text and boost the score. > > Currently the ParserFactory looks for a plugin based only on the content > type. > > Facing this issue myself I noticed that it would give a very easy > implementation for others if ParserFactory could use NutchConf to check > for certain properties and if matched to use the correct plugin based on > the url and not just the content type. > > The implementation shouldn be to complicated. > > Looking to hear more ideas. > > > > Proposition: Enable Nutch to use a parser plugin not just based on > content type > > ------------------------------------------------------------------------ > ------- > > > > Key: NUTCH-179 > > URL: http://issues.apache.org/jira/browse/NUTCH-179 > > Project: Nutch > > Type: Improvement > > Components: fetcher > > Versions: 0.8-dev > > Reporter: Gal Nitzan > > > > > Sorry, please close this issue. > > I figured that if I set my parse plugin first. I can always be called > first and than decide if I want to parse or not. > > -- > This message is automatically generated by JIRA. > - > If you think it was sent incorrectly contact one of the administrators: > http://issues.apache.org/jira/secure/Administrators.jspa > - > For more information on JIRA, see: > http://www.atlassian.com/software/jira
