Hello everybody,
I have been working on transformers for highlighting words in an XML document.
It wraps words within an xml document with parameterizable tags and attributes.
Currently, I have created two transformers:
1) Highlighting a single keyword specified with a parameter
2) Highlighting multiple keywords and setting automatic keywords on words
according to an extra file (src attr in the transformer fetched with the
SourceResolver protocol) containing keywords and links, like a disctionary or a
thesarus.
The transformers are parameterizable and have default parameter values:
For example,
<map:transform type="highlightkeywordstransformer" src="keywords10000.xml"/>
would search XML from within the "body" element and wraps found keywords with
<a href="if_link_found">keyword</a>
But,
<map:transform type="highlightkeywordstransformer" src="keywords10000.xml">
<map:parameter name="containerElement" value="div"/>
<map:parameter name="containerElementId" value="maincontent"/>
<map:parameter name="wrapElement" value="a"/>
<map:parameter name="wrapAttributeClass" value="thisClass"/>
<map:parameter name="wrapAttributeStyle" value="color:green"/>
</map:transform>
Would only highlight keywords found with containerElement(s) "div" with
"id=maincontent". wrapAttributeXXX value="YYY" will be attribute in the
wrapElement as <wrapElement XXX="YYY"> (thus wrapAttributeClass is translated
to attr class)
Test Results (Windows, 2,8 GHz, 1G memory ):
1) Highlighting 100 kb XML single keyword highlighting: ~35 ms
2) - 10 kb XML, 1000 keywords with links : ~31 ms
- 10 kb XML, 10000 keywords with links: ~94 ms
- 10 kb XML, 200.000 keywords with links: ~1,5 s
To do:
1) Implement CacheableProcessingComponent
2) Implement java.text.BreakIterator (at the moment I seperate words by " ", of
course dirty)
I would like to donate these 2 transformers, shall I send in a patch?
Ard Schrijvers
Hippo
Oosteinde 11
1017WT
Amsterdam
The Netherlands
Telefoon: +31(0)20-5224466
Fax: +31(0)20-5224467
-------------------------------------------------------------
[EMAIL PROTECTED] / www.hippo.nl
--------------------------------------------------------------