DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG· RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT <http://issues.apache.org/bugzilla/show_bug.cgi?id=33389>. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND· INSERTED IN THE BUG DATABASE.
http://issues.apache.org/bugzilla/show_bug.cgi?id=33389 ------- Additional Comments From [EMAIL PROTECTED] 2005-02-03 19:08 ------- (In reply to comment #3) > Not only do we break the Formatter interface, we also break the > behaviour for those upgrading. If you need to keep the Formatter interface stable then we can do this by adding another interface that extends Formatter. The Highlighter can adapt its behaviour depending on the actual instance supplied. Yes the patch does cause a behaviour change. The SimpleHTMLFormatter, GradientFormatter, and SpanGradientFormatter will HTML encode their output where previously they did not. If that behaviour is not acceptable then they can be changed to implement the encodeText method such that it just passes back the originalText, this would give exactly the same behaviour as with the current version. > If people have applications which already escape > the content (eg before or after calling highlighter), when they upgrade to the > proposed new version their content will be double-escaped. I could not find a reliable way of escaping the text if the original search text contained HTML content. If the text is HTML escaped before the TokenStream is built then the highlighter is parsing a potentially different stream to the original search. Also the TokenStream and the text must be kept in sync otherwise the TokenGroup indexes are off. The text cannot be escaped after highlighting because the original contains the same mark-up used to highlight the matching tokens. It would be possible to get the SimpleHTMLFormatter to use 2 specific GUIDs to wrap the highlight tokens as it is unlikely that these would also appear in the original text - these could then be replaced with the correct tags after HTML escaping. > They will also incur > the additional performance overhead for encoding where it may not be required > (Note: could initialize htmlEncode's StringBuffer to at least plainText.length()) . That would certainly help. If we are going to break the Formatter interface how about changing the methods to take a java.io.Writer or java.io.PrintWriter this would reduce the number of temporary strings created? > Will the htmlEncode function you've defined work for more exotic > character sets eg CKY? No, it is a 'brain dead' implementation. Commons Lang has an implementation that could be used: escapeHtml(String) in http://svn.apache.org/viewcvs.cgi/jakarta/commons/proper/lang/trunk/src/java/org/apache/commons/lang/StringEscapeUtils.java?rev=137958&view=markup Do you really want to include something like that in with the Highlighter? Is the Highlighter supposed to do HTML highlighting out of the box, or should it just be a toolkit that makes it easy to do so. Nicko -- Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]