Great!  I would love to see something like this slip into 1.5.

Markus Fischer-5 wrote:
> 
> Hello Carl,
> 
> I just wanted to let you know that at work we've created a similar 
> solution. However it isn't until next Tuesday I can provide code 
> snippets of our solution to further discuss this matter (everyone's 
> still on holiday here).
> 
> - Markus
> 
> Carl.Vondrick wrote:
>> Replying to myself... After thinking about it some more, I think it 
>> could become even more useful if it returned the actual token object 
>> instead of just the name. For example:
>> 
>>     public function *getMatchedTokens*($string)
>>     {
>>         $words = array();
>> 
>>         $matchExpression = '/^' . str_replace(array('\\?', '\\*'),
>> array('.', '.*') , preg_quote($this->_pattern->text, '/')) . '$/';
>>         if (@preg_match('/\pL/u', 'a') == 1) {
>>             // PCRE unicode support is turned on
>>             // add Unicode modifier to the match expression
>>             $matchExpression .= 'u';
>>         }
>> 
>>         $tokens =
>> Zend_Search_Lucene_Analysis_Analyzer::getDefault()->tokenize($string,
>> 'UTF-8');
>>         foreach ($tokens as $token) {
>>             if (preg_match($matchExpression, $token->getTermText()) ===
>> 1) {
>>                 *$words[] = $token;* // WAS $token->getTermText()
>>             }
>>         }
>>         
>>         return $words;
>>     }
>> 
>>     Carl.Vondrick wrote:
>> 
>>     Looking through the Zend Search Lucene source code, I think there's
>>     a simple change that can make it possible to use a custom
>>     highlighting system with ZSL and at least take a step towards
>>     solving the highlighting extensibility problems.
>> 
>>     The primary issue with using a custom highlighter with ZSL is that
>>     it's currently difficult to get an array of words to be highlighted
>>     from a query. This has to be done outside of ZSL and adds
>>     unnecessary complexity. Throughout the various query objects, in the
>>     ->highlightMatchesDOM() methods, the array of words we are looking
>>     for is generated, but then made impossible to access by doing the
>>     actual highlighting.
>> 
>>     The quick and simple change is this: separate the
>>     ->highlightMatchesDOM() method into ->getMatchedWords() and
>>     ->highlightedMatchesDOM(). So, for the Wildcard query, we have:
>> 
>>         public function getMatchedWords($string)
>>         {
>>             $words = array();
>> 
>>             $matchExpression = '/^' . str_replace(array('\\?', '\\*'),
>> array('.', '.*') , preg_quote($this->_pattern->text, '/')) . '$/';
>>             if (@preg_match('/\pL/u', 'a') == 1) {
>>                 // PCRE unicode support is turned on
>>                 // add Unicode modifier to the match expression
>>                 $matchExpression .= 'u';
>>             }
>> 
>>             $tokens =
>> Zend_Search_Lucene_Analysis_Analyzer::getDefault()->tokenize($string,
>> 'UTF-8');
>>             foreach ($tokens as $token) {
>>                 if (preg_match($matchExpression, $token->getTermText())
>> === 1) {
>>                     $words[] = $token->getTermText();
>>                 }
>>             }
>>             
>>             return $words;
>>         }
>> 
>>         public function
>> highlightMatchesDOM(Zend_Search_Lucene_Document_Html $doc, &$colorIndex)
>>         {
>>            
>> $doc->highlight($this->getMatchedWords($doc->getFieldUtf8Value('body')),
>> $this->_getHighlightColor($colorIndex));
>>         }
>> 
>>     The only new code that needs to be written is in the boolean
>>     queries, which will need to iterate over its subqueries and
>>     array_merge() the words each subquery returns.
>> 
>>     This makes it possible to get the matched words with one simple line:
>> 
>>     Zend_Search_Lucene_Search_QueryParser::parse('foo* query
>> string')->getMatchedWords('Hello, my name is Foobar and I am not a
>> query');
>> 
>>     and we're off to the races.
>> 
>>     What do you say? I can offer a patch + unit tests if the community
>>     thinks this is worthwhile (though, IMO, this is a quick change).
>> 
>> 
>> ------------------------------------------------------------------------
>> View this message in context: Re: Simple solution for Zend Search Lucene 
>> highlighting? 
>> <http://www.nabble.com/Simple-solution-for-Zend-Search-Lucene-highlighting--tp14545203s16154p14561466.html>
>> Sent from the Zend Framework mailing list archive 
>> <http://www.nabble.com/Zend-Framework-f15440.html> at Nabble.com.
> 

-- 
View this message in context: 
http://www.nabble.com/Simple-solution-for-Zend-Search-Lucene-highlighting--tp14545203s16154p14769145.html
Sent from the Zend Framework mailing list archive at Nabble.com.

Reply via email to