Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The following page has been changed by MarkyGoldstein: http://wiki.apache.org/nutch/Features ------------------------------------------------------------------------------ (Please reformat this text and divide into feature lists, questions and questions & answers). - ==Features== + == Features == - ==Questions and Answers== + == Questions and Answers == - - ==Questions== - *What kind of searches does Nutch support? (quoted, nested, truncation, wildcarding [and where], Boolean), * "...." (phrase search?), + (what is this for?), - (negation) and fieldname:term. No "AND" or "OR". The and-logic is implied. + *Is stemming an option? - * According to the [http://www.lucenebook.com/ Lucene in Action] book: "Nutch does not use stemming or term aliasing of any kind. Search engines have not historically done much stemming, but it is a question that comes up regularly." -- page 329 + * According to the [http://www.lucenebook.com/ Lucene in Action] book: "Nutch does not use stemming or term aliasing of any kind. Search engines have not historically done much + stemming, but it is a question that comes up regularly." -- page 329 + *What kind of stemming does Nutch use? (and can you add exceptions/changes?) * See previous answer :) + *Does Nutch support Boolean operators? (can you use Google-like plus or minus or are you stuck with 1990s terms?) * No - *Does Nutch support weighted field searching, synonym support? - *What kinds of indexes does Nutch build? (multi-format indexing, incremental indexing, spell-check support, thesauri support, fielded searching, rank-by-reputation?) *How does the search engine handle punctuation and special characters? (and what's configurable?) * They are treated like a space. + *Which document formats are supported? * Guessing from the names of the available parser plugins, this is probably it. However, only the plain text and HTML are enabled by default. Edit conf/nutch-site.xml and change the value of plugin.includes property to include the plugins for the document types that you want Nutch to handle: * Plain Text (plugin: parse-text) @@ -38, +38 @@ title, artist, album, comments, etc. The useful information needed to search mp3s) * ZIP (?) This seems to expand the zip of plain text files and return the concatenated text. (parse-zip) + + == Questions without Answers == + + *Does Nutch support weighted field searching, synonym support? + + *What kinds of indexes does Nutch build? (multi-format indexing, incremental indexing, spell-check support, thesauri support, fielded searching, rank-by-reputation?) + *What post-coordination options are available? (hey Karen, what does this mean?) *How easy is Nutch to configure? + *How transparent is its configuration to a working organization: does it require geeky command line stuff, or can a knowledgable manager enter a web or software interface to view or modify settings? * How are results sorted? + * Does Nutch support deduping? + * Can one tinker with relevance algoritms? + * Are there ranking overrides? ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Nutch-cvs mailing list Nutch-cvs@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-cvs