Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The following page has been changed by KurosakaTeruhiko:
http://wiki.apache.org/nutch/Features

------------------------------------------------------------------------------
  
   *How does the search engine handle punctuation and special characters? (and 
what's configurable?)
   *Which document formats are supported?
+   * Guessing from the names of the available parser plugins, this is probably 
it:
+    *Plain Text (in a fixed preconfigured charset only)
+    * HTML (in most any charsets)
+    * JavaScript (for extracting links only?)
+    * Microsoft Power Point, the .ppt file
+    * Microsoft Word, the .doc file
+    * Adobe PDF
+    * RSS
+    * RTF
+    * MP3 (?) Is there any text in MP3?
+    * ZIP (?) This seems to expand the zip of plain text files and return the 
concatenated text.
+ 
   *What post-coordination options are available? (hey Karen, what does this 
mean?)
  
   *How easy is Nutch to configure?

Reply via email to