Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The following page has been changed by JeffRitchie: http://wiki.apache.org/nutch/nutch-0%2e8-dev/bin/nutch_parse ------------------------------------------------------------------------------ None. === Caveats and Notes === - None. + The Parser depends upon a number of plugins to parse the various documents fetched from a crawl. Document types supported and the plugins needed are as follows:[[BR]][[BR]] + + ||'''Content-type'''||'''Plugin'''||'''Notes'''|| + ||'''text/html'''||parse-html||Parses html documents using NekoHTML or !TagSoup|| + ||'''application/x-javascript'''||parse-js||Parses !JavaScript Documents (.js).|| + ||'''audio/mpeg'''||parse-mp3||Parses MP3 Audio Documents (.mp3).|| + ||'''application/vnd.ms-excel'''||parse-msexcel||Parses MSExcel Documents (.xls).|| + ||'''application/vnd.ms-powerpoint'''||parse-mspowerpoint||Parses MSPower!Point Documents|| + ||'''application/msword'''||parse-msword||Parses MSWord Documents|| + ||'''application/rss+xml'''||parse-rss||Parses RSS Documents (.rss)|| + ||'''application/rtf'''||parse-rtf||Parses RTF Documents (.rtf)|| + ||'''application/pdf'''||parse-pdf||Parses PDF Documents|| + ||'''application/x-shockwave-flash'''||parse-swf||Parses Flash Documents (.swf)|| + ||'''text-plain'''||parse-text||Parses Text Documents (.txt)|| + ||'''application/zip'''||parse-zip||Parses Zip Documents (.zip)|| + ||'''other types'''||parse-ext||Parses Documents with external commands based upon content-type or pathSuffix|| + + By default only text,html and js are enabled. The other plugins need to be enabled in nutch-site.xml. DevelopmentCommandLineOptions ------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 _______________________________________________ Nutch-cvs mailing list Nutch-cvs@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-cvs