Hi,

I am rather new to MarkLogic, and running into some performance problems.

Here is what I try to accomplish:
- I have a set of ML xml documents, each containing a record from my source
database. Each document identified by the Primary Key from my source
- Periodically I create a dump of my source database
- Then I try to identify the records that have changed compared to the
previous time I made my database dump.
- My intention is to do this by taking the PK from my new dump, and create
a hash 64 for the full record. And then try to compare this to the previous
time I created my database dump.

For a couple hundreds records this performs quite OK, but I get performance
problems when running it against thousands or more records.

Tried adding a range index, but still no better performing results. Can you
help me out? I have included the script to create a dummy base set of XML
documents, as well as a script to create a new dummy database dump (with
every 100th record having a change). And a script to check which records
have changed. This latter script functionally works, but it is very slow.

Do you have better ideas? Would it for instance help to create a separate
set of documents that only contains the primary keys and hash totals to
check?

Thanks for your help
Winfred Zwaard

DIKW consultancy
xquery version "1.0-ml";
declare namespace html = "http://www.w3.org/1999/xhtml";;

let $basedoc:=<basedoc>
<name>ACCOUNT NAME</name>
<date_entered>2010-09-19 07:00:00.0</date_entered>
<date_modified>2013-12-16 16:15:48.0</date_modified>
<modified_user_id>9f90c30c-a5d3-c9e1-147b-51da634834a9</modified_user_id>
<created_by>1</created_by>
<description/>
<deleted>0</deleted>
<assigned_user_id>91272d8a-3cd5-da2d-e523-509a8afaa030</assigned_user_id>
<account_type>PROSPECT</account_type>
<industry>GOVERNMENT</industry>
<annual_revenue/>
<phone_fax/>
<billing_address_street/>
<billing_address_city/>
<billing_address_state/>
<billing_address_postalcode/>
<billing_address_country/>
<rating/>
<phone_office/>
<phone_alternate/>
<website>website</website>
<ownership/>
<employees/>
<ticker_symbol/>
<shipping_address_street>Shipping address street</shipping_address_street>
<shipping_address_city>shipping_address_city</shipping_address_city>
<shipping_address_state/>
<shipping_address_postalcode>shipping_address_postalcode</shipping_address_postalcode>
<shipping_address_country>shipping_address_country</shipping_address_country>
<parent_id>4fb75ce6-6b80-e613-2642-506bf3534b39</parent_id>
<sic_code/>
<campaign_id/>
</basedoc>
for $i in (1 to 100000)

return
xdmp:document-insert(fn:concat('ACCOUNTS', 
'232c87ea-b4b2-998d-7c9f-506bf324e557', $i, '.xml'), 
<XML_TABLE>
<DATA_TUPLE>
<__PK__>232c87ea-b4b2-998d-7c9f-506bf324e557{$i}</__PK__>
<__HASH__>{xdmp:hash64(<hash>
<id>232c87ea-b4b2-998d-7c9f-506bf324e557{$i}</id>
{$basedoc/*}</hash>
)}
</__HASH__>
<id>232c87ea-b4b2-998d-7c9f-506bf324e557{$i}</id>
{$basedoc/*}
</DATA_TUPLE>
</XML_TABLE>
, ()
, 'STORED_VALUES'
)
xquery version "1.0-ml";
declare namespace html = "http://www.w3.org/1999/xhtml";;

let $basedoc:=<basedoc>
<name>ACCOUNT NAME</name>
<date_entered>2010-09-19 07:00:00.0</date_entered>
<date_modified>2013-12-16 16:15:48.0</date_modified>
<modified_user_id>9f90c30c-a5d3-c9e1-147b-51da634834a9</modified_user_id>
<created_by>1</created_by>
<description/>
<deleted>0</deleted>
<assigned_user_id>91272d8a-3cd5-da2d-e523-509a8afaa030</assigned_user_id>
<account_type>PROSPECT</account_type>
<industry>GOVERNMENT</industry>
<annual_revenue/>
<phone_fax/>
<billing_address_street/>
<billing_address_city/>
<billing_address_state/>
<billing_address_postalcode/>
<billing_address_country/>
<rating/>
<phone_office/>
<phone_alternate/>
<website>website</website>
<ownership/>
<employees/>
<ticker_symbol/>
<shipping_address_street>Shipping address street</shipping_address_street>
<shipping_address_city>shipping_address_city</shipping_address_city>
<shipping_address_state/>
<shipping_address_postalcode>shipping_address_postalcode</shipping_address_postalcode>
<shipping_address_country>shipping_address_country</shipping_address_country>
<parent_id>4fb75ce6-6b80-e613-2642-506bf3534b39</parent_id>
<sic_code/>
<campaign_id/>
</basedoc>

let $changedoc:=<basedoc>
<name>ACCOUNT NAME</name>
<date_entered>2010-09-19 07:00:00.0</date_entered>
<date_modified>2013-12-16 16:15:48.0</date_modified>
<modified_user_id>9f90c30c-a5d3-c9e1-147b-51da634834a9</modified_user_id>
<created_by>1</created_by>
<description/>
<deleted>0</deleted>
<assigned_user_id>91272d8a-3cd5-da2d-e523-509a8afaa030</assigned_user_id>
<account_type>CLIENT</account_type>
<industry>GOVERNMENT</industry>
<annual_revenue/>
<phone_fax/>
<billing_address_street/>
<billing_address_city/>
<billing_address_state/>
<billing_address_postalcode/>
<billing_address_country/>
<rating/>
<phone_office/>
<phone_alternate/>
<website>website</website>
<ownership/>
<employees/>
<ticker_symbol/>
<shipping_address_street>Shipping address street</shipping_address_street>
<shipping_address_city>shipping_address_city</shipping_address_city>
<shipping_address_state/>
<shipping_address_postalcode>shipping_address_postalcode</shipping_address_postalcode>
<shipping_address_country>shipping_address_country</shipping_address_country>
<parent_id>4fb75ce6-6b80-e613-2642-506bf3534b39</parent_id>
<sic_code/>
<campaign_id/>
</basedoc>

return
(
xdmp:document-insert('ACCOUNTS.xml', 
<XML_TABLE>
<DATA_TUPLE>
{for $i in (1 to 100000)
return
<TUPLE_DETAILS>
<__PK__>232c87ea-b4b2-998d-7c9f-506bf324e557{$i}</__PK__>
<__HASH__>{xdmp:hash64(<hash>
<id>232c87ea-b4b2-998d-7c9f-506bf324e557{$i}</id>
{if (($i*0.01) eq xs:integer($i*0.01)) then ($changedoc/*) else 
($basedoc/*)}</hash>
)}
</__HASH__>
<id>232c87ea-b4b2-998d-7c9f-506bf324e557{$i}</id>
{if (($i*0.01) eq xs:integer($i*0.01)) then ($changedoc/*) else ($basedoc/*)}
</TUPLE_DETAILS>
}
</DATA_TUPLE>
</XML_TABLE>
, ()
, 'CHANGED_VALUES'
)
)
xquery version "1.0-ml";
declare namespace html = "http://www.w3.org/1999/xhtml";;

(
for $changed_values in 
collection('CHANGED_VALUES')/XML_TABLE/DATA_TUPLE/TUPLE_DETAILS
  , $stored_values in fn:document(fn:concat('ACCOUNTS', $changed_values/__PK__, 
'.xml'))/XML_TABLE/DATA_TUPLE
where $stored_values/__PK__ eq $changed_values/__PK__
and $stored_values/__HASH__ ne $changed_values/__HASH__
return ($stored_values/__PK__)
, xdmp:query-meters()
)
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to