Suggestion:
For consistency purpose, and easy of nutch management, why not filtering the
extensions based on the activated plugins?
By looking at the mime-types defined in the parse-plugins.xml file and the
activated plugins, we know which content-types will be parsed.
So, by getting the file extensions associated to each content-type, we can
build a list of file extensions to include (other ones will be excluded) in
the fecth process.
No?

Jérôme

--
http://motrech.free.fr/
http://www.frutch.org/

Reply via email to