Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The following page has been changed by KurosakaTeruhiko:
http://wiki.apache.org/nutch/Features

------------------------------------------------------------------------------
  
   *How does the search engine handle punctuation and special characters? (and 
what's configurable?)
   *Which document formats are supported?
-   * Guessing from the names of the available parser plugins, this is probably 
it:
+   * Guessing from the names of the available parser plugins, this is probably 
it.  However, only the plain text and HTML are enabled by default.  Edit 
conf/nutch-site.xml and change the value of plugin.includes property to include 
the plugins for the document types that you want Nutch to handle:
-    *Plain Text (in a fixed preconfigured charset only)
+    * Plain Text (in a fixed preconfigured charset only) (plugin: parse-text)
-    * HTML (in most any charsets)
+    * HTML (in most any charsets) (parse-html)
-    * JavaScript (for extracting links only?)
+    * JavaScript (for extracting links only?) (parse-js)
-    * Microsoft Power Point, the .ppt file
+    * Microsoft Power Point, the .ppt file (parse-mspowerpoint)
-    * Microsoft Word, the .doc file
+    * Microsoft Word, the .doc file (parse-msword)
-    * Adobe PDF
-    * RSS
-    * RTF
+    * Adobe PDF (parse-pdf)
+    * RSS (parse-rss)
+    * RTF (parse-rtf)
-    * MP3 (?) Is there any text in MP3?
+    * MP3 (?) Is there any text in MP3? (parse-mp3)
-    * ZIP (?) This seems to expand the zip of plain text files and return the 
concatenated text.
+    * ZIP (?) This seems to expand the zip of plain text files and return the 
concatenated text. (parse-zip)
  
   *What post-coordination options are available? (hey Karen, what does this 
mean?)
  


-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Nutch-cvs mailing list
Nutch-cvs@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-cvs

Reply via email to