Re: [fw-general] How could Zend_Search be used to search word like "R&D"?

Alexander Veremyev Mon, 14 May 2007 10:18:07 -0700

Partout wrote:

Alexander,
My script section is below:
                ......
                Zend_Search_Lucene_Analysis_Analyzer::setDefault(new
Zend_Search_Lucene_Analysis_Analyzer_Commo
n_TextNum_CaseInsensitive());
Zend_Search_Lucene_Search_QueryParser::setDefaultOperator(Zend_Search_Lucene_Search_QueryParser
::B_AND);
                $search = "R\&D";   //I have tried "R\\&D",
'"R\&D"',"R&D",'"R&D"'$query =
Zend_Search_Lucene_Search_QueryParser::parse($search);
                echo $query->__toString() . "\n";
                $hits = $index->find($query);
                ......

Result output:
"R\&D" --> +(+r +d)"R\\&D" --> +(+r +d)'"R\&D"' --> +("r d")"R&D" --> 'Zend_Search_Lucene_Search_QueryParserException'
with message 'Two chars lexeme expected. Position is 2.'
'"R&D"' --> +("r d")


That's correct behavior. '"R&D"' should give you a result you need.

If you prefer to consider R&D as one word, make your own analyzer. TakeZend_Search_Lucene_Analysis_Analyzer_Common_TextNum_CaseInsensitive andZend_Search_Lucene_Analysis_Analyzer_Common_TextNum as an examples.


Actually you need:
------------------------------------------------------------------
class MyAnalyzer extends Zend_Search_Lucene_Analysis_Analyzer_Common
{
    private $_position;

    const NONBROKE_CHARS = '&!|\/';

    public function __construct()
    {

$this->addFilter(newZend_Search_Lucene_Analysis_TokenFilter_LowerCase());

    }

    public function reset()
    {
        $this->_position = 0;

        if ($this->_input === null) {
            return;
        }

        // convert input into ascii

$this->_input = iconv($this->_encoding, 'ASCII//TRANSLIT',$this->_input);

        $this->_encoding = 'ASCII';
    }

    /**
     * Tokenization stream API
     * Get next token
     * Returns null at the end of stream
     *
     * @return Zend_Search_Lucene_Analysis_Token|null
     */
    public function nextToken()
    {
        if ($this->_input === null) {
            return null;
        }

        while ($this->_position < strlen($this->_input)) {
            // skip white space
            while ($this->_position < strlen($this->_input) &&
                   !ctype_alnum( $this->_input[$this->_position] ) &&

(strpos( self::NONBROKE_CHARS,$this->_input[$this->_position] ) === false) ) {

                $this->_position++;
            }

            $termStartPosition = $this->_position;

            // read token
            while ($this->_position < strlen($this->_input) &&
                   (ctype_alnum( $this->_input[$this->_position] ) ||

(strpos( self::NONBROKE_CHARS,$this->_input[$this->_position] ) !== false)) ) {

                $this->_position++;
            }

            // Empty token, end of stream.
            if ($this->_position == $termStartPosition) {
                return null;
            }

            $token = new Zend_Search_Lucene_Analysis_Token(
                                      substr($this->_input,
                                             $termStartPosition,

$this->_position -$termStartPosition),

                                      $termStartPosition,
                                      $this->_position);
            $token = $this->normalize($token);
            if ($token !== null) {
                return $token;
            }
            // Continue if token is skipped
        }

        return null;
    }
}
-----------------------------------------

The same analyzer should be used for indexing and searching.

PS This analyzer will also give you a possibility to search with Lukeusing "R\&D" query.


With best regards,
   Alexander Veremyev.

Besides, I also built the index with
Zend_Search_Lucene_Analysis_Analyzer_Common_TextNum_CaseInsensitive()
option.

Is there still other way to solve my problem? Thank you very much.

Best regards,

David


Alexander Veremyev wrote:
Hi,
Are you sure that escaping "\&" wasn't translated into '&' beforesending it to query parser?
Please try "R\\&D".
In addition to this you need special analyzer to consider R&D as oneword (default analyzer translates it into phrase "r d").
Take a look onZend_Search_Lucene_Analysis_Analyzer_Common_TextNum_CaseInsensitiveclass and Zend_Search_Lucene_Analysis_Analyzer::setDefault() method.
http://framework.zend.com/manual/en/zend.search.extending.html#zend.search.extending.analysis


With best regards,
    Alexander Veremyev.



Partout wrote:
Hi All, I am using Zend_Search, and glad to see many new enhancementswere added. Many thanks to all of you. But still a question, that is, Ineed to search some words just like "R&D", "J2EE" .... Who can tell mehow to get it? I have used "R\&D", but it throw exception below: /Fatalerror: Uncaught exception'Zend_Search_Lucene_Search_QueryParserException' with message 'Two charslexeme expected. Position is 4.' in/opt/system/Zend/Search/Lucene/Search/QueryLexer.php:397 Stack trace: #0/opt/system/Zend/Search/Lucene/FSMAction.php(62):Zend_Search_Lucene_Search_QueryLexer->addQuerySyntaxLexeme() ...../Thanks in advance. David
------------------------------------------------------------------------
View this message in context: How could Zend_Search be used to searchword like "R&D"?<http://www.nabble.com/How-could-Zend_Search-be-used-to-search-word-like-%22R-D%22--tf3734766s16154.html#a10454147>Sent from the Zend Framework mailing list archive<http://www.nabble.com/Zend-Framework-f15440.html> at Nabble.com.

Re: [fw-general] How could Zend_Search be used to search word like "R&D"?

Reply via email to