Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The following page has been changed by Paul Ruiz: http://wiki.apache.org/nutch/Features ------------------------------------------------------------------------------ * HTML (parse-html) * XML (parse-xml) uses XPath and namespaces to do the mapping between XML elements and Lucene fields. * Java``Script (for extracting links only?) (parse-js) + * OpenOfice.org ODF (parse-oo) parses Open Office and Star Office documents. * Microsoft Power Point, the .ppt file (parse-mspowerpoint) * Microsoft Word, the .doc file (parse-msword) * Adobe PDF (parse-pdf)