I believe the stock Nutch does not employ the stemming or the prefixquery from Lucene. Is this because such queries are too expensive? Or is it that they are just not useful, that 99% of Nutch users just don't need them?
Lucene has some stemming modules for English and German I see. Does English stemming just drop right into a Nutch build? Or, I could modify the basic query filter so that it parses for * at the end of words and then... Nutch ultimately hands Lucene a BooleanQuery and I could just OR on an extra PrefixQuery? I notice in these slides that these advanced derived queries are mentioned http://66.102.7.104/search?q=cache:MYweVmrEgV4J:www.wgrosso.com/Archives /Presentations_SDForumEmergingTechSig/SDForumETSIG_NutchAndLucene_Dec200 3.pdf+nutch+n-grams&hl=en but the slides simply mention them in passing. Steve ---- ...from an earlier post in nutch-general: "Wildcards search in a public search engine is probably not a good idea, because such queries are very expensive. If you need them in an intranet search, it is possible to extend the query syntax to support it - but you would need to write that part of query parsing... Found by searching through: http://www.mail-archive.com/[email protected]/ http://www.mail-archive.com/[email protected]/ ------------------------------------------------------- The SF.Net email is sponsored by: Beat the post-holiday blues Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
