Hi Gail,

 Check out:

http://wiki.apache.org/nutch/ParserFactoryImprovementProposal/

That's the way that the parser factory currently works. Also added, but not
described in that proposal is the ability to call a parser by its id, which
is a method present in ParseUtil.java.

G'luck!

Cheers,
  Chris


______________________________________________
Chris A. Mattmann
[EMAIL PROTECTED] 
Staff Member
Modeling and Data Management Systems Section (387)
Data Management Systems and Technologies Group

_________________________________________________
Jet Propulsion Laboratory            Pasadena, CA
Office: 171-266B                        Mailstop:  171-246
_______________________________________________________

Disclaimer:  The opinions presented within are my own and do not reflect
those of either NASA, JPL, or the California Institute of Technology.


> -----Original Message-----
> From: Gal Nitzan (JIRA) [mailto:[EMAIL PROTECTED]
> Sent: Sunday, January 15, 2006 4:10 PM
> To: [email protected]
> Subject: [jira] Updated: (NUTCH-179) Proposition: Enable Nutch to use a
> parser plugin not just based on content type
> 
>      [ http://issues.apache.org/jira/browse/NUTCH-179?page=all ]
> 
> Gal Nitzan updated NUTCH-179:
> -----------------------------
> 
>     Description:
> Sorry, please close this issue.
> 
> I figured that if I set my parse plugin first. I can always be called
> first and than decide if I want to parse or not.
> 
>   was:
> Somtime there are requirements of the "real world" (usually your boss)
> where a special parse is required for a certain site. Though the content
> type is text/html, a specialized parser is needed.
> 
> Sample: I am required to crawl certain sites where some of them are
> partners sites. when fetching from the partners site I need to look for
> certain entries in the text and boost the score.
> 
> Currently the ParserFactory looks for a plugin based only on the content
> type.
> 
> Facing this issue myself I noticed that it would give a very easy
> implementation for others if ParserFactory could use NutchConf to check
> for certain properties and if matched to use the correct plugin based on
> the url and not just the content type.
> 
> The implementation shouldn be to complicated.
> 
> Looking to hear more ideas.
> 
> 
> > Proposition: Enable Nutch to use a parser plugin not just based on
> content type
> > ------------------------------------------------------------------------
> -------
> >
> >          Key: NUTCH-179
> >          URL: http://issues.apache.org/jira/browse/NUTCH-179
> >      Project: Nutch
> >         Type: Improvement
> >   Components: fetcher
> >     Versions: 0.8-dev
> >     Reporter: Gal Nitzan
> 
> >
> > Sorry, please close this issue.
> > I figured that if I set my parse plugin first. I can always be called
> first and than decide if I want to parse or not.
> 
> --
> This message is automatically generated by JIRA.
> -
> If you think it was sent incorrectly contact one of the administrators:
>    http://issues.apache.org/jira/secure/Administrators.jspa
> -
> For more information on JIRA, see:
>    http://www.atlassian.com/software/jira

Reply via email to