Hi Danny, Thanks for that, it's very helpful. The first option seems like the best approach (and makes a lot of sense now).
Instead of making the range index on <e:date> I will make it on a custom element, say <my:date>. Then after the cpf has finished, as an on-modify trigger, I can go through the <e:date> elements and where they are castable as xs:date convert them to <my:date>, and where they aren't just remove the markup. Thanks for you help, cheers andrew 2010/1/25 Danny Sokolsky <[email protected]>: > Hi Andrew, > > The fundamental issue is that you have declared that e:date is an xs:date, > but your data does not support that, so when you update a document with an > invalid date, that update fails (as it should, because you have declared > something about its type that is not true, hence it throws an exception). I > can think of a few approaches to deal with this: > > * You can make the index on e:date a string. You could either leave it that > way or transform it later to a date (fixing the bad dates). This is probably > the most straightforward approach. If you look at the schema for entity.xsd, > you will notice that date is an xs:string, not a date. This is because the > entity enrichment is looking for text that represents dates, not actual dates. > * You can put a try/catch in the step that adds the enrichment, and if it > throws that exception put the document into another state that can have an > action to deal with it. You can either deal with it manually or > programmatically. > * You can look at all the e:date elements after you enrich them, but before > you insert them into the database, and clean up any bad dates. This approach > might be the hardest to implement. One idea is your code can look for bad > e:date elements, fix them with some dummy data, and put the real data > somewhere else (possible in an attribute or an adjacent element) and clean it > up later. The in-mem-update code might help you with this approach. > > What you do might depend on how widespread this problem is. If it is only > happening on a few documents, then a relatively manual approach might work. > If it is widespread, then you will probably want a general solution. > > Maybe that will give you a few ideas. > > -Danny > > -----Original Message----- > From: [email protected] > [mailto:[email protected]] On Behalf Of Andrew Welch > Sent: Monday, January 25, 2010 3:18 AM > To: General Mark Logic Developer Discussion > Subject: Re: [MarkLogic Dev General] Configuring entity enrichment > > Ok, here's the current issue: > > - I'm using the entity enrichment pipeline as part of the CPF > - I would like to use a range index on e:date > - Currently words like "Monday" are being marked up with e:date > - When attempting to load any doc that contains a non-xs:date for an > e:date, I get the exception that the contents are not castable as > xs:date > - Along with fixing up <e:date>, I would also like to removed unused > elements added by the enrichment, such as e:url, e:number, e:title > > What is the best approach here? > > Starting with the e:date issue, I've tried using my own custom > pipeline to operate after the entity enrichment step, but that fails > with above exception. I also tried using a post-commit on-update > trigger, but of course also fails the same issue as the document has > been saved by that point. > > The most recent attempt is to remove the inbuilt entity enrichment > pipeline from the cpf chain, and then call entity:enrich() in the > trigger, which gives me the chance to remove any e:date's that are not > castable as xs:date before the document is saved. However, using > xdmp:node-replace on an in-memory tree isn't possible, failing with > the exception "cannot update constructed nodes". > > Googling around I've come across this post: > > http://xqzone.marklogic.com/pipermail/general/2008-September/001811.html > > which links to this module: > > http://xqzone.marklogic.com/svn/commons/trunk/memupdate/in-mem-update.xqy > > I could use that, but that the standard / best / most appropriate way > to approach this problem? > > thanks > andrew > > > 2010/1/22 Danny Sokolsky <[email protected]>: >> You should *not* modify enrich.xqy directly, otherwise your changes will be >> lost upon upgrade. Create your own function that does something similar and >> put it in your own location (for example, under your App Server route). >> >> You can start your new module by copying the code from enrich.xqy, though. >> > _______________________________________________ > General mailing list > [email protected] > http://xqzone.com/mailman/listinfo/general > _______________________________________________ > General mailing list > [email protected] > http://xqzone.com/mailman/listinfo/general > -- Andrew Welch http://andrewjwelch.com Kernow: http://kernowforsaxon.sf.net/ _______________________________________________ General mailing list [email protected] http://xqzone.com/mailman/listinfo/general
