Looking through the Zend Search Lucene source code, I think there's a simple
change that can make it possible to use a custom highlighting system with
ZSL and at least take a step towards solving the highlighting extensibility
problems.
The primary issue with using a custom highlighter with ZSL is that it's
currently difficult to get an array of words to be highlighted from a query.
This has to be done outside of ZSL and adds unnecessary complexity.
Throughout the various query objects, in the ->highlightMatchesDOM()
methods, the array of words we are looking for is generated, but then made
impossible to access by doing the actual highlighting.
The quick and simple change is this: separate the ->highlightMatchesDOM()
method into ->getMatchedWords() and ->highlightedMatchesDOM(). So, for the
Wildcard query, we have:
public function getMatchedWords($string)
{
$words = array();
$matchExpression = '/^' . str_replace(array('\\?', '\\*'),
array('.', '.*') , preg_quote($this->_pattern->text, '/')) . '$/';
if (@preg_match('/\pL/u', 'a') == 1) {
// PCRE unicode support is turned on
// add Unicode modifier to the match expression
$matchExpression .= 'u';
}
$tokens =
Zend_Search_Lucene_Analysis_Analyzer::getDefault()->tokenize($string,
'UTF-8');
foreach ($tokens as $token) {
if (preg_match($matchExpression, $token->getTermText()) === 1) {
$words[] = $token->getTermText();
}
}
return $words;
}
public function highlightMatchesDOM(Zend_Search_Lucene_Document_Html
$doc, &$colorIndex)
{
$doc->highlight($this->getMatchedWords($doc->getFieldUtf8Value('body')),
$this->_getHighlightColor($colorIndex));
}
The only new code that needs to be written is in the boolean queries, which
will need to iterate over its subqueries and array_merge() the words each
subquery returns.
This makes it possible to get the matched words with one simple line:
Zend_Search_Lucene_Search_QueryParser::parse('foo* query
string')->getMatchedWords('Hello, my name is Foobar and I am not a query');
and we're off to the races.
What do you say? I can offer a patch + unit tests if the community thinks
this is worthwhile (though, IMO, this is a quick change).
--
View this message in context:
http://www.nabble.com/Simple-solution-for-Zend-Search-Lucene-highlighting--tp14545203s16154p14545203.html
Sent from the Zend Framework mailing list archive at Nabble.com.