Hi Andrew, You can use cts:entity-highlight to customize how you do your markup. It is very similar to cts:highlight, and gives you lots of control as to what to output. If you look at the code for entity:enrich, you will notice it uses cts:entity-highlight. Using cts:entity-highlight, you could add whatever logic you want to your code (for example, the logic you mentioned about the dates) to decide whether and what to highlight.
For example, if you take the simple example in the apidoc for cts:entity-highlight: http://developer.marklogic.com/pubs/4.1/apidocs/SearchBuiltins.html#cts:entity-highlight and modify it a little, you can add some logic similar to what you had in mind (this is around the value of the social security number instead of the value of dates): xquery version "1.0-ml"; let $myxml := <node>George Washington never visited Norway. If he had a Social Security number, it might be 000-00-0001.</node> return cts:entity-highlight($myxml, element { fn:replace($cts:entity-type, ":", "-") } { if ($cts:entity-type = "IDENTIFIER:PERSONAL_ID_NUM" ) then if (fn:starts-with($cts:text, "000" ) ) then "not a valid social security number" else $cts:text else $cts:text }) returns: <node><PERSON>George Washington</PERSON> never visited <GPE>Norway</GPE>. If he had a Social Security number, it might be <IDENTIFIER-PERSONAL_ID_NUM>not a valid social security number</IDENTIFIER-PERSONAL_ID_NUM>.</node> Does that help? -Danny -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Andrew Welch Sent: Thursday, January 21, 2010 7:11 AM To: General Mark Logic Developer Discussion Subject: [MarkLogic Dev General] Configuring entity enrichment Hi, Is there a way to configure what the entity enrichment marks up? Two things mainly: 1. Can I tell it not to mark up certain "types" that I don't need enriched, such as urls? (looking for the right word there) 2. Given a type, such as <e:date>, can I configure what will match as a date? Currently things castable as an xs:date get marked up (such as 2010-01-21) which is what I need, but also words like "Thursday"... (I'd like to put an xs:date based index on <e:date>) What is considered the best approach here? Should I add a post-commit on-update trigger that post processes the entity markup to be what I need, or should I configure the process somehow to not mark it up in the first place? thanks -- Andrew Welch http://andrewjwelch.com Kernow: http://kernowforsaxon.sf.net/ _______________________________________________ General mailing list [email protected] http://xqzone.com/mailman/listinfo/general _______________________________________________ General mailing list [email protected] http://xqzone.com/mailman/listinfo/general
