Great! I would love to see something like this slip into 1.5.
Markus Fischer-5 wrote:
>
> Hello Carl,
>
> I just wanted to let you know that at work we've created a similar
> solution. However it isn't until next Tuesday I can provide code
> snippets of our solution to further discuss this matter (everyone's
> still on holiday here).
>
> - Markus
>
> Carl.Vondrick wrote:
>> Replying to myself... After thinking about it some more, I think it
>> could become even more useful if it returned the actual token object
>> instead of just the name. For example:
>>
>> public function *getMatchedTokens*($string)
>> {
>> $words = array();
>>
>> $matchExpression = '/^' . str_replace(array('\\?', '\\*'),
>> array('.', '.*') , preg_quote($this->_pattern->text, '/')) . '$/';
>> if (@preg_match('/\pL/u', 'a') == 1) {
>> // PCRE unicode support is turned on
>> // add Unicode modifier to the match expression
>> $matchExpression .= 'u';
>> }
>>
>> $tokens =
>> Zend_Search_Lucene_Analysis_Analyzer::getDefault()->tokenize($string,
>> 'UTF-8');
>> foreach ($tokens as $token) {
>> if (preg_match($matchExpression, $token->getTermText()) ===
>> 1) {
>> *$words[] = $token;* // WAS $token->getTermText()
>> }
>> }
>>
>> return $words;
>> }
>>
>> Carl.Vondrick wrote:
>>
>> Looking through the Zend Search Lucene source code, I think there's
>> a simple change that can make it possible to use a custom
>> highlighting system with ZSL and at least take a step towards
>> solving the highlighting extensibility problems.
>>
>> The primary issue with using a custom highlighter with ZSL is that
>> it's currently difficult to get an array of words to be highlighted
>> from a query. This has to be done outside of ZSL and adds
>> unnecessary complexity. Throughout the various query objects, in the
>> ->highlightMatchesDOM() methods, the array of words we are looking
>> for is generated, but then made impossible to access by doing the
>> actual highlighting.
>>
>> The quick and simple change is this: separate the
>> ->highlightMatchesDOM() method into ->getMatchedWords() and
>> ->highlightedMatchesDOM(). So, for the Wildcard query, we have:
>>
>> public function getMatchedWords($string)
>> {
>> $words = array();
>>
>> $matchExpression = '/^' . str_replace(array('\\?', '\\*'),
>> array('.', '.*') , preg_quote($this->_pattern->text, '/')) . '$/';
>> if (@preg_match('/\pL/u', 'a') == 1) {
>> // PCRE unicode support is turned on
>> // add Unicode modifier to the match expression
>> $matchExpression .= 'u';
>> }
>>
>> $tokens =
>> Zend_Search_Lucene_Analysis_Analyzer::getDefault()->tokenize($string,
>> 'UTF-8');
>> foreach ($tokens as $token) {
>> if (preg_match($matchExpression, $token->getTermText())
>> === 1) {
>> $words[] = $token->getTermText();
>> }
>> }
>>
>> return $words;
>> }
>>
>> public function
>> highlightMatchesDOM(Zend_Search_Lucene_Document_Html $doc, &$colorIndex)
>> {
>>
>> $doc->highlight($this->getMatchedWords($doc->getFieldUtf8Value('body')),
>> $this->_getHighlightColor($colorIndex));
>> }
>>
>> The only new code that needs to be written is in the boolean
>> queries, which will need to iterate over its subqueries and
>> array_merge() the words each subquery returns.
>>
>> This makes it possible to get the matched words with one simple line:
>>
>> Zend_Search_Lucene_Search_QueryParser::parse('foo* query
>> string')->getMatchedWords('Hello, my name is Foobar and I am not a
>> query');
>>
>> and we're off to the races.
>>
>> What do you say? I can offer a patch + unit tests if the community
>> thinks this is worthwhile (though, IMO, this is a quick change).
>>
>>
>> ------------------------------------------------------------------------
>> View this message in context: Re: Simple solution for Zend Search Lucene
>> highlighting?
>> <http://www.nabble.com/Simple-solution-for-Zend-Search-Lucene-highlighting--tp14545203s16154p14561466.html>
>> Sent from the Zend Framework mailing list archive
>> <http://www.nabble.com/Zend-Framework-f15440.html> at Nabble.com.
>
--
View this message in context:
http://www.nabble.com/Simple-solution-for-Zend-Search-Lucene-highlighting--tp14545203s16154p14769145.html
Sent from the Zend Framework mailing list archive at Nabble.com.