Hi,

I'm feeling bad already because I promised to send something, but in the company world other things take priority unfortunately ...

Anyway, If I don't come back around next week just go ahead I'ld say. I try to get something to the list before Wednesday but I better not promise something.

- Markus

Carl.Vondrick wrote:
Great!  I would love to see something like this slip into 1.5.


Markus Fischer-5 wrote:
Hello Carl,

I just wanted to let you know that at work we've created a similar solution. However it isn't until next Tuesday I can provide code snippets of our solution to further discuss this matter (everyone's still on holiday here).

- Markus

Carl.Vondrick wrote:
Replying to myself... After thinking about it some more, I think it could become even more useful if it returned the actual token object instead of just the name. For example:

    public function *getMatchedTokens*($string)
    {
        $words = array();

        $matchExpression = '/^' . str_replace(array('\\?', '\\*'),
array('.', '.*') , preg_quote($this->_pattern->text, '/')) . '$/';
        if (@preg_match('/\pL/u', 'a') == 1) {
            // PCRE unicode support is turned on
            // add Unicode modifier to the match expression
            $matchExpression .= 'u';
        }

        $tokens =
Zend_Search_Lucene_Analysis_Analyzer::getDefault()->tokenize($string,
'UTF-8');
        foreach ($tokens as $token) {
            if (preg_match($matchExpression, $token->getTermText()) ===
1) {
                *$words[] = $token;* // WAS $token->getTermText()
            }
        }
return $words;
    }

    Carl.Vondrick wrote:

    Looking through the Zend Search Lucene source code, I think there's
    a simple change that can make it possible to use a custom
    highlighting system with ZSL and at least take a step towards
    solving the highlighting extensibility problems.

    The primary issue with using a custom highlighter with ZSL is that
    it's currently difficult to get an array of words to be highlighted
    from a query. This has to be done outside of ZSL and adds
    unnecessary complexity. Throughout the various query objects, in the
    ->highlightMatchesDOM() methods, the array of words we are looking
    for is generated, but then made impossible to access by doing the
    actual highlighting.

    The quick and simple change is this: separate the
    ->highlightMatchesDOM() method into ->getMatchedWords() and
    ->highlightedMatchesDOM(). So, for the Wildcard query, we have:

        public function getMatchedWords($string)
        {
            $words = array();

            $matchExpression = '/^' . str_replace(array('\\?', '\\*'),
array('.', '.*') , preg_quote($this->_pattern->text, '/')) . '$/';
            if (@preg_match('/\pL/u', 'a') == 1) {
                // PCRE unicode support is turned on
                // add Unicode modifier to the match expression
                $matchExpression .= 'u';
            }

            $tokens =
Zend_Search_Lucene_Analysis_Analyzer::getDefault()->tokenize($string,
'UTF-8');
            foreach ($tokens as $token) {
                if (preg_match($matchExpression, $token->getTermText())
=== 1) {
                    $words[] = $token->getTermText();
                }
            }
return $words;
        }

        public function
highlightMatchesDOM(Zend_Search_Lucene_Document_Html $doc, &$colorIndex)
        {
$doc->highlight($this->getMatchedWords($doc->getFieldUtf8Value('body')),
$this->_getHighlightColor($colorIndex));
        }

    The only new code that needs to be written is in the boolean
    queries, which will need to iterate over its subqueries and
    array_merge() the words each subquery returns.

    This makes it possible to get the matched words with one simple line:

    Zend_Search_Lucene_Search_QueryParser::parse('foo* query
string')->getMatchedWords('Hello, my name is Foobar and I am not a
query');

    and we're off to the races.

    What do you say? I can offer a patch + unit tests if the community
    thinks this is worthwhile (though, IMO, this is a quick change).


------------------------------------------------------------------------
View this message in context: Re: Simple solution for Zend Search Lucene highlighting? <http://www.nabble.com/Simple-solution-for-Zend-Search-Lucene-highlighting--tp14545203s16154p14561466.html> Sent from the Zend Framework mailing list archive <http://www.nabble.com/Zend-Framework-f15440.html> at Nabble.com.

Reply via email to