Hi, I added some log trace so I can see more detail... finding now: -both nutchBean and webapps fail, only Luke success (by manually select the correct analyzer) -the Analyzer I wrap by plugin is org.apache.lucene.analysis.cjk.CJKAnalyzer for zh locale. Method is followed the Nutch wiki of multilingual support, http://wiki.apache.org/nutch/MultiLingualSupport But since language identifier is not supported the zh locale, so I hack mentioned in another post in Nutch-User
-the analyzer loaded and the lang is fit my expectation -but only english and single word can be found. When I click explain, I can see single word only but what the CJKAnalyzer Plugin.xml: <?xml version="1.0" encoding="UTF-8"?> <plugin id="analysis-zh" name="Chinese Analysis Plug-in" version="1.0.0" provider-name="org.apache.nutch"> <runtime> <library name="analysis-zh.jar"> <export name="*"/> </library> </runtime> <requires> <import plugin="nutch-extensionpoints"/> <import plugin="lib-lucene-analyzers"/> </requires> <extension id="org.apache.nutch.analysis.zh" name="ChineseAnalyzer" point="org.apache.nutch.analysis.NutchAnalyzer"> <implementation id="org.apache.nutch.analysis.zh.ChineseAnalyzer" class="org.apache.nutch.analysis.zh.ChineseAnalyzer"> <parameter name="lang" value="zh"/> </implementation> </extension> </plugin> Java class: package org.apache.nutch.analysis.zh; // JDK imports import java.io.Reader; // Lucene imports import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.TokenStream; // Nutch imports import org.apache.nutch.analysis.NutchAnalyzer; public class ChineseAnalyzer extends NutchAnalyzer { private final static Analyzer ANALYZER = new org.apache.lucene.analysis.cjk.CJKAnalyzer(); /** Creates a new instance of ChineseAnalyzer */ public ChineseAnalyzer() { } public TokenStream tokenStream(String fieldName, Reader reader) { return ANALYZER.tokenStream(fieldName, reader); } } <property> <name>plugin.includes</name> <value>analysis-(zh)|language-identifier|protocol-http|urlfilter-regex|parse-(text|html|js|rss)|feed|index-basic|query-(basic|site|url)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)</value> </property> Can anyone provide more hint about the query processing so I can get which part is failed? Sorry for keep on posting as I really doesn't know where is the failure point... Vinci wrote: > > Hi all, > > I have changed the Analyzer of nutch and make it work for the luence > sandbox analyzer. I use luke to check the language and the query and they > look work fine. However, I find the method posted in wiki is not work fine > for me, and most of the post just mention how to make the index work but > not how to dealing with the query when plugin in use. > > now, I looked at the catnalina log, I see it know the language about > query... > > <timestamp> lang:zh > > But the result is not correct. When I trace the explain, I find it cut in > keyword analyzer manner, not the one I used for zh by plugin. > > Can anybody help me? Thank a lots. > -- View this message in context: http://www.nabble.com/Chnage-the-Analyzer-by-plugin---how-to-dealing-with-the-query--tp16077090p16077954.html Sent from the Nutch - Dev mailing list archive at Nabble.com.