Hi Winfred,
Just an idea, but if you can store the hash values of the documents in ML, then maybe you could use cts:value-co-occurrences between a PK-element range index and a hash-value-element range index ( http://docs.marklogic.com/cts:element-value-co-occurrences). Use the map option, and you have all the PKs and their hash values in a map (PK as key, and hash as value). You can serialize the map:map if you want to store the state of the db (just put it in an xml element, then insert it into ML somewhere). You can get the xml back as a map when you need it. It might then be possible to create a current map, get the old map, and use the MarkLogic map operators to find documents added, deleted, or changes. - Chris On Sun, Mar 2, 2014 at 2:47 PM, Winfred Zwaard <[email protected]>wrote: > Hi, > > I am rather new to MarkLogic, and running into some performance problems. > > Here is what I try to accomplish: > - I have a set of ML xml documents, each containing a record from my > source database. Each document identified by the Primary Key from my source > - Periodically I create a dump of my source database > - Then I try to identify the records that have changed compared to the > previous time I made my database dump. > - My intention is to do this by taking the PK from my new dump, and create > a hash 64 for the full record. And then try to compare this to the previous > time I created my database dump. > > For a couple hundreds records this performs quite OK, but I get > performance problems when running it against thousands or more records. > > Tried adding a range index, but still no better performing results. Can > you help me out? I have included the script to create a dummy base set of > XML documents, as well as a script to create a new dummy database dump > (with every 100th record having a change). And a script to check which > records have changed. This latter script functionally works, but it is very > slow. > > Do you have better ideas? Would it for instance help to create a separate > set of documents that only contains the primary keys and hash totals to > check? > > Thanks for your help > Winfred Zwaard > > DIKW consultancy > > > > > > _______________________________________________ > General mailing list > [email protected] > http://developer.marklogic.com/mailman/listinfo/general > >
_______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
