[ http://issues.apache.org/jira/browse/NUTCH-34?page=comments#action_62996 
]
     
Andrzej Bialecki  commented on NUTCH-34:
----------------------------------------

Currently there is such a "registry", and it is built and maintained by 
PluginRepository.

So, it seems to me that the only change required here would be to add 
attributes to each plugin config file (and plugin interface) which inform all 
plugin users about the following:

* a boolean, whether the plugin can handle incomplete files or not.

* an int, setting the content size limit.

> Parsing different content formats
> ---------------------------------
>
>          Key: NUTCH-34
>          URL: http://issues.apache.org/jira/browse/NUTCH-34
>      Project: Nutch
>         Type: Improvement
>   Components: fetcher
>     Reporter: Stephan Strittmatter
>     Priority: Trivial

>
> At the moment Nuch is set up to filter content by config the xml-config file.
> There it is also set global how many bytes are loaded.
> I think it yould be better to let the parser plugins "register" themselfe in 
> some registry where every plugin could tell the fetcher, that:
> 1. this document type is wanted (because the parser plugin is 
>    installed and activated)
> 2. how much of the content is required (some plugins need the whole 
>    content and some not)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
If you want more information on JIRA, or have a bug to report see:
   http://www.atlassian.com/software/jira

Reply via email to