Re: Chnage the Analyzer by plugin - how to dealing with the query? Query always use the default analyzer!

Vinci Sun, 16 Mar 2008 04:43:45 -0700

Hi,

I added some log trace so I can see more detail...
finding now:
-both nutchBean and webapps fail, only Luke success (by manually select the
correct analyzer)
-the Analyzer I wrap by plugin is org.apache.lucene.analysis.cjk.CJKAnalyzer
for zh locale. Method is followed the Nutch wiki of multilingual support, 
http://wiki.apache.org/nutch/MultiLingualSupport
But since language identifier is not supported the zh locale, so I hack
mentioned in another post in Nutch-User


-the analyzer loaded and the lang is fit my expectation

-but only english and single word can be found. When I click explain, I can
see single word only but what the CJKAnalyzer 


Plugin.xml:
<?xml version="1.0" encoding="UTF-8"?>

<plugin
   id="analysis-zh"
   name="Chinese Analysis Plug-in"
   version="1.0.0"
   provider-name="org.apache.nutch">

   <runtime>
      <library name="analysis-zh.jar">
         <export name="*"/>
      </library>
   </runtime>

   <requires>
      <import plugin="nutch-extensionpoints"/>
      <import plugin="lib-lucene-analyzers"/>
   </requires>

   <extension id="org.apache.nutch.analysis.zh"
              name="ChineseAnalyzer"
              point="org.apache.nutch.analysis.NutchAnalyzer">

      <implementation id="org.apache.nutch.analysis.zh.ChineseAnalyzer"
                      class="org.apache.nutch.analysis.zh.ChineseAnalyzer">
        <parameter name="lang" value="zh"/>
      </implementation>

   </extension>

</plugin>

Java class:
package org.apache.nutch.analysis.zh;

// JDK imports
import java.io.Reader;

// Lucene imports
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.TokenStream;

// Nutch imports
import org.apache.nutch.analysis.NutchAnalyzer;
public class ChineseAnalyzer extends NutchAnalyzer {
    
    private final static Analyzer ANALYZER = 
            new org.apache.lucene.analysis.cjk.CJKAnalyzer();

    
    /** Creates a new instance of ChineseAnalyzer */
    public ChineseAnalyzer() { }


    public TokenStream tokenStream(String fieldName, Reader reader) {
        return ANALYZER.tokenStream(fieldName, reader);
    }

}

<property>

<name>plugin.includes</name>


<value>analysis-(zh)|language-identifier|protocol-http|urlfilter-regex|parse-(text|html|js|rss)|feed|index-basic|query-(basic|site|url)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)</value>

</property>

Can anyone provide more hint about the query processing so I can get which
part is failed?
Sorry for keep on posting as I really doesn't know where is the failure
point...


Vinci wrote:
> 
> Hi all,
> 
> I have changed the Analyzer of nutch and make it work for the luence
> sandbox analyzer. I use luke to check the language and the query and they
> look work fine. However, I find the method posted in wiki is not work fine
> for me, and most of the post just mention how to make the index work but
> not how to dealing with the query when plugin in use.
> 
> now, I looked at the catnalina log, I see it know the language about
> query...
> 
> <timestamp> lang:zh
> 
> But the result is not correct. When I trace the explain, I find it cut in
> keyword analyzer manner, not the one I used for zh by plugin. 
> 
> Can anybody help me? Thank a lots.
> 

-- 
View this message in context: 
http://www.nabble.com/Chnage-the-Analyzer-by-plugin---how-to-dealing-with-the-query--tp16077090p16077954.html
Sent from the Nutch - Dev mailing list archive at Nabble.com.

Re: Chnage the Analyzer by plugin - how to dealing with the query? Query always use the default analyzer!

Reply via email to