Patrick Force wrote:
We have a directory that includes policy data, each in a separate document, with the root node name InsurancePolicy. I'm trying to figure out the best way to optimize querying /InsurancePolicy. The documents in this directory are currently at about 20,000 and are quickly growing. I'm currently trying to run queries for data cleaning and viewing data in general such as:

for $unattachedpolicy in /InsurancePolicy
where empty(/[EMAIL PROTECTED] = $unattachedpolicy/@productId and @version = $unattachedpolicy/@productVersion]/@id) return <unattachedpolicy id={$unattachedpolicy/@id} productId={$unattachedpolicy/@productId} productVersion={$unattachedpolicy/@productVersion}/>

Hi Patrick,

First let me confirm I understand what you're trying to do. You're looking for /InsurancePolicy documents where either there isn't a corresponding /InsuranceProduct or where if there is a corresponding /InsuranceProduct then it's lacking an id attribute. The definition of "corresponding" is based on comparing two attribute values.

I suspect the "or lacking an id attribute" really isn't necessary although the code is written that way right now. So basically you're looking for policies that lack a corresponding product.

Second, can I ask how long is "very long"? With this schema, because the id and version together act as kind of a pseudo primary key, I don't think you'll be able to do full resolution out of indexes. The best algorithm will be O(n) where n is the number of policies.

for $policy in /InsurancePolicy
let $id := $policy/@productId
let $ver := $policy/@productVersion
where xdmp:estimate(/[EMAIL PROTECTED] = [EMAIL PROTECTED] = $version]) = 0
return
<unattachedpolicy>{$policy/@*}</unattachedpolicy>

This will result in a disk bound query, taking perhaps 20 seconds for 20k fragment loads of all /InsurancePolicy root fragments. This is probably faster than what you're seeing no because no /InsuranceProduct fragments need be loaded.

Hope this helps.  Maybe someone else has a better idea?

-jh-
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

Reply via email to