RE: [MarkLogic Dev General] Configuring entity enrichment

Danny Sokolsky Mon, 25 Jan 2010 09:34:40 -0800

Hi Andrew,

The fundamental issue is that you have declared that e:date is an xs:date, but 
your data does not support that, so when you update a document with an invalid 
date, that update fails (as it should, because you have declared something 
about its type that is not true, hence it throws an exception).  I can think of 
a few approaches to deal with this:


* You can make the index on e:date a string.  You could either leave it that 
way or transform it later to a date (fixing the bad dates).  This is probably 
the most straightforward approach.  If you look at the schema for entity.xsd, 
you will notice that date is an xs:string, not a date.  This is because the 
entity enrichment is looking for text that represents dates, not actual dates.
* You can put a try/catch in the step that adds the enrichment, and if it 
throws that exception put the document into another state that can have an 
action to deal with it.  You can either deal with it manually or 
programmatically.
* You can look at all the e:date elements after you enrich them, but before you 
insert them into the database, and clean up any bad dates.  This approach might 
be the hardest to implement.  One idea is your code can look for bad e:date 
elements, fix them with some dummy data, and put the real data somewhere else 
(possible in an attribute or an adjacent element) and clean it up later.  The 
in-mem-update code might help you with this approach.  

What you do might depend on how widespread this problem is.  If it is only 
happening on a few documents, then a relatively manual approach might work.  If 
it is widespread, then you will probably want a general solution.

Maybe that will give you a few ideas.

-Danny

-----Original Message-----
From: [email protected] 
[mailto:[email protected]] On Behalf Of Andrew Welch
Sent: Monday, January 25, 2010 3:18 AM
To: General Mark Logic Developer Discussion
Subject: Re: [MarkLogic Dev General] Configuring entity enrichment

Ok, here's the current issue:

- I'm using the entity enrichment pipeline as part of the CPF
- I would like to use a range index on e:date
- Currently words like "Monday" are being marked up with e:date
- When attempting to load any doc that contains a non-xs:date for an
e:date, I get the exception that the contents are not castable as
xs:date
- Along with fixing up <e:date>, I would also like to removed unused
elements added by the enrichment, such as e:url, e:number, e:title

What is the best approach here?

Starting with the e:date issue, I've tried using my own custom
pipeline to operate after the entity enrichment step, but that fails
with above exception.  I also tried using a post-commit on-update
trigger, but of course also fails the same issue as the document has
been saved by that point.

The most recent attempt is to remove the inbuilt entity enrichment
pipeline from the cpf chain, and then call entity:enrich() in the
trigger, which gives me the chance to remove any e:date's that are not
castable as xs:date before the document is saved.  However, using
xdmp:node-replace on an in-memory tree isn't possible, failing with
the exception "cannot update constructed nodes".

Googling around I've come across this post:

http://xqzone.marklogic.com/pipermail/general/2008-September/001811.html

which links to this module:

http://xqzone.marklogic.com/svn/commons/trunk/memupdate/in-mem-update.xqy

I could use that, but that the standard / best / most appropriate way
to approach this problem?

thanks
andrew


2010/1/22 Danny Sokolsky <[email protected]>:
> You should *not* modify enrich.xqy directly, otherwise your changes will be 
> lost upon upgrade.  Create your own function that does something similar and 
> put it in your own location (for example, under your App Server route).
>
> You can start your new module by copying the code from enrich.xqy, though.
>
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

RE: [MarkLogic Dev General] Configuring entity enrichment

Reply via email to