Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The following page has been changed by JeffRitchie:
http://wiki.apache.org/nutch/nutch-0%2e8-dev/bin/nutch_parse

------------------------------------------------------------------------------
   None.
  
  === Caveats and Notes ===
-  None.
+  The Parser depends upon a number of plugins to parse the various documents 
fetched from a crawl.  Document types supported and the plugins needed are as 
follows:[[BR]][[BR]]
+ 
+  ||'''Content-type'''||'''Plugin'''||'''Notes'''||
+  ||'''text/html'''||parse-html||Parses html documents using NekoHTML or 
!TagSoup||
+  ||'''application/x-javascript'''||parse-js||Parses !JavaScript Documents 
(.js).||
+  ||'''audio/mpeg'''||parse-mp3||Parses MP3 Audio Documents (.mp3).||
+  ||'''application/vnd.ms-excel'''||parse-msexcel||Parses MSExcel Documents 
(.xls).||
+  ||'''application/vnd.ms-powerpoint'''||parse-mspowerpoint||Parses 
MSPower!Point Documents||
+  ||'''application/msword'''||parse-msword||Parses MSWord Documents||
+  ||'''application/rss+xml'''||parse-rss||Parses RSS Documents (.rss)||
+  ||'''application/rtf'''||parse-rtf||Parses RTF Documents (.rtf)||
+  ||'''application/pdf'''||parse-pdf||Parses PDF Documents||
+  ||'''application/x-shockwave-flash'''||parse-swf||Parses Flash Documents 
(.swf)||
+  ||'''text-plain'''||parse-text||Parses Text Documents (.txt)||
+  ||'''application/zip'''||parse-zip||Parses Zip Documents (.zip)||
+  ||'''other types'''||parse-ext||Parses Documents with external commands 
based upon content-type or pathSuffix||
+ 
+ By default only text,html and js are enabled.  The other plugins need to be 
enabled in nutch-site.xml.
  
  DevelopmentCommandLineOptions
  


-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Nutch-cvs mailing list
Nutch-cvs@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-cvs

Reply via email to