I believe the stock Nutch does not employ the stemming or the
prefixquery from Lucene. Is this because such queries are too expensive?
Or is it that they are just not useful, that 99% of Nutch users just
don't need them?

Lucene has some stemming modules for English and German I see. Does
English stemming just drop right into a Nutch build?

Or, I could modify the basic query filter so that it parses for * at the
end of words and then... Nutch ultimately hands Lucene a BooleanQuery
and I could just OR on an extra PrefixQuery? 

I notice in these slides that these advanced derived queries are
mentioned
http://66.102.7.104/search?q=cache:MYweVmrEgV4J:www.wgrosso.com/Archives
/Presentations_SDForumEmergingTechSig/SDForumETSIG_NutchAndLucene_Dec200
3.pdf+nutch+n-grams&hl=en
but the slides simply mention them in passing. 

Steve

----

...from an earlier post in nutch-general:

"Wildcards search in a public search engine is probably not a good idea,
because such queries are very expensive. If you need them in an intranet
search, it is possible to extend the query syntax to support it - but
you would need to write that part of query parsing...

Found by searching through:

http://www.mail-archive.com/[email protected]/
http://www.mail-archive.com/[email protected]/



-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to