Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The following page has been changed by KurosakaTeruhiko:
http://wiki.apache.org/nutch/Features

------------------------------------------------------------------------------
  Missing from the current Nutch documentation (Tutorial, FAQ) is a list of 
features. This wiki page could help, if someone who knows the answers can edit 
it.
  
   *What kind of searches does Nutch support? (quoted, nested, truncation, 
wildcarding [and where], Boolean),
+     * "...." (phrase search?), + (what is this for?), - (negation) and 
fieldname:term.  No "AND" or "OR".  The and-logic is imlied.
   *Is stemming an option?
      * According to the [http://www.lucenebook.com/ Lucene in Action] book: 
"Nutch does not use stemming or term aliasing of any kind.  Search engines have 
not historically done much stemming, but it is a question that comes up 
regularly." -- page 329
   *What kind of stemming does Nutch use? (and can you add exceptions/changes?)
      * See previous answer :)
   *Does Nutch support Boolean operators? (can you use Google-like plus or 
minus or are you stuck with 1990s terms?)
+     * No
   *Does Nutch support weighted field searching, synonym support?
   *What kinds of indexes does Nutch build? (multi-format indexing, incremental 
indexing, spell-check support, thesauri support, fielded searching,  
rank-by-reputation?)
  
   *How does the search engine handle punctuation and special characters? (and 
what's configurable?)
+     * They are treated like a space.
   *Which document formats are supported?
    * Guessing from the names of the available parser plugins, this is probably 
it.  However, only the plain text and HTML are enabled by default.  Edit 
conf/nutch-site.xml and change the value of plugin.includes property to include 
the plugins for the document types that you want Nutch to handle:
     * Plain Text (plugin: parse-text)


-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-cvs mailing list
Nutch-cvs@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-cvs

Reply via email to