(nutch 1.0) Query processing problem: NutchBeans and webapps search fail, but Luke sucess

Vinci Sun, 16 Mar 2008 05:29:24 -0700

Hi all,

(This is the most updated post, sorry for posting many time as I think I
describe the problem not well...)


The current condition is same as title: NutchBeans and webapps fail, but
Luke sucess - with my own analyzer plugin. That is, only Luke can search
with the index generated after crawling (by manually select the correct
analyzer), the other 2 return very few or no result to me.

I am using the nutch 1.0 dev version, with luence 2.2.0

-the Analyzer I wrap by plugin is org.apache.lucene.analysis.cjk.CJKAnalyzer
for zh locale. Method is followed the Nutch wiki of multilingual support,
http://wiki.apache.org/nutch/MultiLingualSupport
But since language identifier is not supported the zh locale, so I hack
mentioned in another post in Nutch-User, Change of analyzer for specific
language
http://www.nabble.com/Change-of-analyzer-for-specific-language-tp16065385p16067807.html
However, the search is failed. only Luke can search my index....

-the target analyzer loaded and the lang matchs my expectation

-but only english and single cjk character can be found. When I click
explain, I can see single word only but what the CJKAnalyzer genrate in
index is bi-gram...so I think the query tokenization done by NutchAnalyzer
is not correct...


Plugin.xml:
<?xml version="1.0" encoding="UTF-8"?>

<plugin
   id="analysis-zh"
   name="Chinese Analysis Plug-in"
   version="1.0.0"
   provider-name="org.apache.nutch">

   <runtime>
      <library name="analysis-zh.jar">
         <export name="*"/>
      </library>
   </runtime>

   <requires>
      <import plugin="nutch-extensionpoints"/>
      <import plugin="lib-lucene-analyzers"/>
   </requires>

   <extension id="org.apache.nutch.analysis.zh"
              name="ChineseAnalyzer"
              point="org.apache.nutch.analysis.NutchAnalyzer">

      <implementation id="org.apache.nutch.analysis.zh.ChineseAnalyzer"
                      class="org.apache.nutch.analysis.zh.ChineseAnalyzer">
        <parameter name="lang" value="zh"/>
      </implementation>

   </extension>

</plugin>

Java class:
package org.apache.nutch.analysis.zh;

// JDK imports
import java.io.Reader;

// Lucene imports
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.TokenStream;

// Nutch imports
import org.apache.nutch.analysis.NutchAnalyzer;
public class ChineseAnalyzer extends NutchAnalyzer {
   
    private final static Analyzer ANALYZER =
            new org.apache.lucene.analysis.cjk.CJKAnalyzer();

   
    /** Creates a new instance of ChineseAnalyzer */
    public ChineseAnalyzer() { }


    public TokenStream tokenStream(String fieldName, Reader reader) {
        return ANALYZER.tokenStream(fieldName, reader);
    }

}

<property>

<name>plugin.includes</name>


<value>analysis-(zh)|language-identifier|protocol-http|urlfilter-regex|parse-(text|html|js|rss)|feed|index-basic|query-(basic|site|url)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)</value>

</property> 


Thank you for reading this long post. I look at the class, add the trace to
log, but I get more confusing with the result.....look like the nutchBean or
the nutchAnalyzer fault but I can't figure out which point are the point of
failure...
-- 
View this message in context: 
http://www.nabble.com/%28nutch-1.0%29-Query-processing-problem%3A-NutchBeans-and-webapps-search-fail%2C-but-Luke-sucess-tp16078400p16078400.html
Sent from the Nutch - Dev mailing list archive at Nabble.com.

(nutch 1.0) Query processing problem: NutchBeans and webapps search fail, but Luke sucess

Reply via email to