Hi Andrew, The fundamental issue is that you have declared that e:date is an xs:date, but your data does not support that, so when you update a document with an invalid date, that update fails (as it should, because you have declared something about its type that is not true, hence it throws an exception). I can think of a few approaches to deal with this:
* You can make the index on e:date a string. You could either leave it that way or transform it later to a date (fixing the bad dates). This is probably the most straightforward approach. If you look at the schema for entity.xsd, you will notice that date is an xs:string, not a date. This is because the entity enrichment is looking for text that represents dates, not actual dates. * You can put a try/catch in the step that adds the enrichment, and if it throws that exception put the document into another state that can have an action to deal with it. You can either deal with it manually or programmatically. * You can look at all the e:date elements after you enrich them, but before you insert them into the database, and clean up any bad dates. This approach might be the hardest to implement. One idea is your code can look for bad e:date elements, fix them with some dummy data, and put the real data somewhere else (possible in an attribute or an adjacent element) and clean it up later. The in-mem-update code might help you with this approach. What you do might depend on how widespread this problem is. If it is only happening on a few documents, then a relatively manual approach might work. If it is widespread, then you will probably want a general solution. Maybe that will give you a few ideas. -Danny -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Andrew Welch Sent: Monday, January 25, 2010 3:18 AM To: General Mark Logic Developer Discussion Subject: Re: [MarkLogic Dev General] Configuring entity enrichment Ok, here's the current issue: - I'm using the entity enrichment pipeline as part of the CPF - I would like to use a range index on e:date - Currently words like "Monday" are being marked up with e:date - When attempting to load any doc that contains a non-xs:date for an e:date, I get the exception that the contents are not castable as xs:date - Along with fixing up <e:date>, I would also like to removed unused elements added by the enrichment, such as e:url, e:number, e:title What is the best approach here? Starting with the e:date issue, I've tried using my own custom pipeline to operate after the entity enrichment step, but that fails with above exception. I also tried using a post-commit on-update trigger, but of course also fails the same issue as the document has been saved by that point. The most recent attempt is to remove the inbuilt entity enrichment pipeline from the cpf chain, and then call entity:enrich() in the trigger, which gives me the chance to remove any e:date's that are not castable as xs:date before the document is saved. However, using xdmp:node-replace on an in-memory tree isn't possible, failing with the exception "cannot update constructed nodes". Googling around I've come across this post: http://xqzone.marklogic.com/pipermail/general/2008-September/001811.html which links to this module: http://xqzone.marklogic.com/svn/commons/trunk/memupdate/in-mem-update.xqy I could use that, but that the standard / best / most appropriate way to approach this problem? thanks andrew 2010/1/22 Danny Sokolsky <[email protected]>: > You should *not* modify enrich.xqy directly, otherwise your changes will be > lost upon upgrade. Create your own function that does something similar and > put it in your own location (for example, under your App Server route). > > You can start your new module by copying the code from enrich.xqy, though. > _______________________________________________ General mailing list [email protected] http://xqzone.com/mailman/listinfo/general _______________________________________________ General mailing list [email protected] http://xqzone.com/mailman/listinfo/general
