Thanks.  If I did it this way, I would be replacing the inbuilt entity
enrichment pipeline (which calls enrich.xqy) with some custom code?

I was wondering if there was a way to configure the inbuilt
enrichment, rather than replace it.  If not, is it considered ok to
modify the queries that are used by ML (modify enrich.xqy directly) or
is it better to leave the enrichment in its standard form, then write
a query to alter the XML after that...?

thanks
andrew

2010/1/21 Danny Sokolsky <[email protected]>:
> Hi Andrew,
>
> You can use cts:entity-highlight to customize how you do your markup.  It is 
> very similar to cts:highlight, and gives you lots of control as to what to 
> output.  If you look at the code for entity:enrich, you will notice it uses 
> cts:entity-highlight.  Using cts:entity-highlight, you could add whatever 
> logic you want to your code (for example, the logic you mentioned about the 
> dates) to decide whether and what to highlight.
>
> For example, if you take the simple example in the apidoc for 
> cts:entity-highlight:
>
> http://developer.marklogic.com/pubs/4.1/apidocs/SearchBuiltins.html#cts:entity-highlight
>
> and modify it a little, you can add some logic similar to what you had in 
> mind (this is around the value of the social security number instead of the 
> value of dates):
>
> xquery version "1.0-ml";
> let $myxml := <node>George Washington never visited Norway.
>              If he had a Social Security number,
>              it might be 000-00-0001.</node>
> return
> cts:entity-highlight($myxml,
>   element { fn:replace($cts:entity-type, ":", "-") } {
>      if ($cts:entity-type = "IDENTIFIER:PERSONAL_ID_NUM" )
>      then if (fn:starts-with($cts:text, "000" ) )
>           then "not a valid social security number"
>           else $cts:text
>      else $cts:text })
>
> returns:
> <node><PERSON>George Washington</PERSON> never visited <GPE>Norway</GPE>.
>              If he had a Social Security number,
>              it might be <IDENTIFIER-PERSONAL_ID_NUM>not a valid social 
> security number</IDENTIFIER-PERSONAL_ID_NUM>.</node>
>
> Does that help?
>
> -Danny
>
> -----Original Message-----
> From: [email protected] 
> [mailto:[email protected]] On Behalf Of Andrew Welch
> Sent: Thursday, January 21, 2010 7:11 AM
> To: General Mark Logic Developer Discussion
> Subject: [MarkLogic Dev General] Configuring entity enrichment
>
> Hi,
>
> Is there a way to configure what the entity enrichment marks up?  Two
> things mainly:
>
> 1. Can I tell it not to mark up certain "types" that I don't need
> enriched, such as urls?  (looking for the right word there)
>
> 2. Given a type, such as <e:date>, can I configure what will match as
> a date?  Currently things castable as an xs:date get marked up (such
> as 2010-01-21) which is what I need, but also words like "Thursday"...
>  (I'd like to put an xs:date based index on <e:date>)
>
> What is considered the best approach here?  Should I add a post-commit
> on-update trigger that post processes the entity markup to be what I
> need, or should I configure the process somehow to not mark it up in
> the first place?
>
>
> thanks
> --
> Andrew Welch
> http://andrewjwelch.com
> Kernow: http://kernowforsaxon.sf.net/
> _______________________________________________
> General mailing list
> [email protected]
> http://xqzone.com/mailman/listinfo/general
> _______________________________________________
> General mailing list
> [email protected]
> http://xqzone.com/mailman/listinfo/general
>



-- 
Andrew Welch
http://andrewjwelch.com
Kernow: http://kernowforsaxon.sf.net/
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

Reply via email to