Hi Andrew,

You can use cts:entity-highlight to customize how you do your markup.  It is 
very similar to cts:highlight, and gives you lots of control as to what to 
output.  If you look at the code for entity:enrich, you will notice it uses 
cts:entity-highlight.  Using cts:entity-highlight, you could add whatever logic 
you want to your code (for example, the logic you mentioned about the dates) to 
decide whether and what to highlight.

For example, if you take the simple example in the apidoc for 
cts:entity-highlight:

http://developer.marklogic.com/pubs/4.1/apidocs/SearchBuiltins.html#cts:entity-highlight

and modify it a little, you can add some logic similar to what you had in mind 
(this is around the value of the social security number instead of the value of 
dates):

xquery version "1.0-ml";
let $myxml := <node>George Washington never visited Norway.  
              If he had a Social Security number, 
              it might be 000-00-0001.</node>
return
cts:entity-highlight($myxml, 
   element { fn:replace($cts:entity-type, ":", "-") } { 
      if ($cts:entity-type = "IDENTIFIER:PERSONAL_ID_NUM" )
      then if (fn:starts-with($cts:text, "000" ) )
           then "not a valid social security number"
           else $cts:text
      else $cts:text })

returns:
<node><PERSON>George Washington</PERSON> never visited <GPE>Norway</GPE>.  
              If he had a Social Security number, 
              it might be <IDENTIFIER-PERSONAL_ID_NUM>not a valid social 
security number</IDENTIFIER-PERSONAL_ID_NUM>.</node>

Does that help?

-Danny

-----Original Message-----
From: [email protected] 
[mailto:[email protected]] On Behalf Of Andrew Welch
Sent: Thursday, January 21, 2010 7:11 AM
To: General Mark Logic Developer Discussion
Subject: [MarkLogic Dev General] Configuring entity enrichment

Hi,

Is there a way to configure what the entity enrichment marks up?  Two
things mainly:

1. Can I tell it not to mark up certain "types" that I don't need
enriched, such as urls?  (looking for the right word there)

2. Given a type, such as <e:date>, can I configure what will match as
a date?  Currently things castable as an xs:date get marked up (such
as 2010-01-21) which is what I need, but also words like "Thursday"...
  (I'd like to put an xs:date based index on <e:date>)

What is considered the best approach here?  Should I add a post-commit
on-update trigger that post processes the entity markup to be what I
need, or should I configure the process somehow to not mark it up in
the first place?


thanks
-- 
Andrew Welch
http://andrewjwelch.com
Kernow: http://kernowforsaxon.sf.net/
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

Reply via email to