Hi Steve, The analyzer that nutch has, will strip off the '*' at the end of the search term, if you pass the search term through it. You might want to look at net/nutch/analysis/NutchAnalysis.jj to see the rules for the analyzer.
To actually query the index for partial matches, you can add PrefixQuery terms to the BooleanQuery that nutch passes to Lucene. Imeplement such rules in the query filter plugin that you will add for your specific application. You can look at the implementation of query-basic and modify it to suit your needs. HTH, Praveen. Steve Follmer <[EMAIL PROTECTED]> said: > > I believe the stock Nutch does not employ the stemming or the > prefixquery from Lucene. Is this because such queries are too expensive? > Or is it that they are just not useful, that 99% of Nutch users just > don't need them? > > Lucene has some stemming modules for English and German I see. Does > English stemming just drop right into a Nutch build? > > Or, I could modify the basic query filter so that it parses for * at the > end of words and then... Nutch ultimately hands Lucene a BooleanQuery > and I could just OR on an extra PrefixQuery? > > I notice in these slides that these advanced derived queries are > mentioned > http://66.102.7.104/search?q=cache:MYweVmrEgV4J:www.wgrosso.com/Archives > /Presentations_SDForumEmergingTechSig/SDForumETSIG_NutchAndLucene_Dec200 > 3.pdf+nutch+n-grams&hl=en > but the slides simply mention them in passing. > > Steve > > ---- > > ...from an earlier post in nutch-general: > > "Wildcards search in a public search engine is probably not a good idea, > because such queries are very expensive. If you need them in an intranet > search, it is possible to extend the query syntax to support it - but > you would need to write that part of query parsing... > > Found by searching through: > > http://www.mail-archive.com/[email protected]/ > http://www.mail-archive.com/[email protected]/ > > > > ------------------------------------------------------- > The SF.Net email is sponsored by: Beat the post-holiday blues > Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. > It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt > _______________________________________________ > Nutch-developers mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/nutch-developers > -- ------------------------------------------------------- The SF.Net email is sponsored by: Beat the post-holiday blues Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
