Re: [MarkLogic Dev General] Configuring entity enrichment

Andrew Welch Mon, 25 Jan 2010 09:59:20 -0800

Hi Danny,

Thanks for that, it's very helpful.  The first option seems like the
best approach (and makes a lot of sense now).


Instead of making the range index on <e:date> I will make it on a
custom element, say <my:date>.  Then after the cpf has finished, as an
on-modify trigger, I can go through the <e:date> elements and where
they are castable as xs:date convert them to <my:date>, and where they
aren't just remove the markup.

Thanks for you help,

cheers
andrew


2010/1/25 Danny Sokolsky <[email protected]>:
> Hi Andrew,
>
> The fundamental issue is that you have declared that e:date is an xs:date, 
> but your data does not support that, so when you update a document with an 
> invalid date, that update fails (as it should, because you have declared 
> something about its type that is not true, hence it throws an exception).  I 
> can think of a few approaches to deal with this:
>
> * You can make the index on e:date a string.  You could either leave it that 
> way or transform it later to a date (fixing the bad dates).  This is probably 
> the most straightforward approach.  If you look at the schema for entity.xsd, 
> you will notice that date is an xs:string, not a date.  This is because the 
> entity enrichment is looking for text that represents dates, not actual dates.
> * You can put a try/catch in the step that adds the enrichment, and if it 
> throws that exception put the document into another state that can have an 
> action to deal with it.  You can either deal with it manually or 
> programmatically.
> * You can look at all the e:date elements after you enrich them, but before 
> you insert them into the database, and clean up any bad dates.  This approach 
> might be the hardest to implement.  One idea is your code can look for bad 
> e:date elements, fix them with some dummy data, and put the real data 
> somewhere else (possible in an attribute or an adjacent element) and clean it 
> up later.  The in-mem-update code might help you with this approach.
>
> What you do might depend on how widespread this problem is.  If it is only 
> happening on a few documents, then a relatively manual approach might work.  
> If it is widespread, then you will probably want a general solution.
>
> Maybe that will give you a few ideas.
>
> -Danny
>
> -----Original Message-----
> From: [email protected] 
> [mailto:[email protected]] On Behalf Of Andrew Welch
> Sent: Monday, January 25, 2010 3:18 AM
> To: General Mark Logic Developer Discussion
> Subject: Re: [MarkLogic Dev General] Configuring entity enrichment
>
> Ok, here's the current issue:
>
> - I'm using the entity enrichment pipeline as part of the CPF
> - I would like to use a range index on e:date
> - Currently words like "Monday" are being marked up with e:date
> - When attempting to load any doc that contains a non-xs:date for an
> e:date, I get the exception that the contents are not castable as
> xs:date
> - Along with fixing up <e:date>, I would also like to removed unused
> elements added by the enrichment, such as e:url, e:number, e:title
>
> What is the best approach here?
>
> Starting with the e:date issue, I've tried using my own custom
> pipeline to operate after the entity enrichment step, but that fails
> with above exception.  I also tried using a post-commit on-update
> trigger, but of course also fails the same issue as the document has
> been saved by that point.
>
> The most recent attempt is to remove the inbuilt entity enrichment
> pipeline from the cpf chain, and then call entity:enrich() in the
> trigger, which gives me the chance to remove any e:date's that are not
> castable as xs:date before the document is saved.  However, using
> xdmp:node-replace on an in-memory tree isn't possible, failing with
> the exception "cannot update constructed nodes".
>
> Googling around I've come across this post:
>
> http://xqzone.marklogic.com/pipermail/general/2008-September/001811.html
>
> which links to this module:
>
> http://xqzone.marklogic.com/svn/commons/trunk/memupdate/in-mem-update.xqy
>
> I could use that, but that the standard / best / most appropriate way
> to approach this problem?
>
> thanks
> andrew
>
>
> 2010/1/22 Danny Sokolsky <[email protected]>:
>> You should *not* modify enrich.xqy directly, otherwise your changes will be 
>> lost upon upgrade.  Create your own function that does something similar and 
>> put it in your own location (for example, under your App Server route).
>>
>> You can start your new module by copying the code from enrich.xqy, though.
>>
> _______________________________________________
> General mailing list
> [email protected]
> http://xqzone.com/mailman/listinfo/general
> _______________________________________________
> General mailing list
> [email protected]
> http://xqzone.com/mailman/listinfo/general
>



-- 
Andrew Welch
http://andrewjwelch.com
Kernow: http://kernowforsaxon.sf.net/
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Configuring entity enrichment

Reply via email to