Hi all, (This is the most updated post, sorry for posting many time as I think I describe the problem not well...)
The current condition is same as title: NutchBeans and webapps fail, but Luke sucess - with my own analyzer plugin. That is, only Luke can search with the index generated after crawling (by manually select the correct analyzer), the other 2 return very few or no result to me. I am using the nutch 1.0 dev version, with luence 2.2.0 -the Analyzer I wrap by plugin is org.apache.lucene.analysis.cjk.CJKAnalyzer for zh locale. Method is followed the Nutch wiki of multilingual support, http://wiki.apache.org/nutch/MultiLingualSupport But since language identifier is not supported the zh locale, so I hack mentioned in another post in Nutch-User, Change of analyzer for specific language http://www.nabble.com/Change-of-analyzer-for-specific-language-tp16065385p16067807.html However, the search is failed. only Luke can search my index.... -the target analyzer loaded and the lang matchs my expectation -but only english and single cjk character can be found. When I click explain, I can see single word only but what the CJKAnalyzer genrate in index is bi-gram...so I think the query tokenization done by NutchAnalyzer is not correct... Plugin.xml: <?xml version="1.0" encoding="UTF-8"?> <plugin id="analysis-zh" name="Chinese Analysis Plug-in" version="1.0.0" provider-name="org.apache.nutch"> <runtime> <library name="analysis-zh.jar"> <export name="*"/> </library> </runtime> <requires> <import plugin="nutch-extensionpoints"/> <import plugin="lib-lucene-analyzers"/> </requires> <extension id="org.apache.nutch.analysis.zh" name="ChineseAnalyzer" point="org.apache.nutch.analysis.NutchAnalyzer"> <implementation id="org.apache.nutch.analysis.zh.ChineseAnalyzer" class="org.apache.nutch.analysis.zh.ChineseAnalyzer"> <parameter name="lang" value="zh"/> </implementation> </extension> </plugin> Java class: package org.apache.nutch.analysis.zh; // JDK imports import java.io.Reader; // Lucene imports import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.TokenStream; // Nutch imports import org.apache.nutch.analysis.NutchAnalyzer; public class ChineseAnalyzer extends NutchAnalyzer { private final static Analyzer ANALYZER = new org.apache.lucene.analysis.cjk.CJKAnalyzer(); /** Creates a new instance of ChineseAnalyzer */ public ChineseAnalyzer() { } public TokenStream tokenStream(String fieldName, Reader reader) { return ANALYZER.tokenStream(fieldName, reader); } } <property> <name>plugin.includes</name> <value>analysis-(zh)|language-identifier|protocol-http|urlfilter-regex|parse-(text|html|js|rss)|feed|index-basic|query-(basic|site|url)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)</value> </property> Thank you for reading this long post. I look at the class, add the trace to log, but I get more confusing with the result.....look like the nutchBean or the nutchAnalyzer fault but I can't figure out which point are the point of failure... -- View this message in context: http://www.nabble.com/%28nutch-1.0%29-Query-processing-problem%3A-NutchBeans-and-webapps-search-fail%2C-but-Luke-sucess-tp16078400p16078400.html Sent from the Nutch - Dev mailing list archive at Nabble.com.