I can't tell if you want a list of matching documents per value, or for all 
values. I could imagine a "report" like the following:

S0016-5085(68)70198-0
-abc.xml
-def.xml
-ghi.xml
S0016-5085(68)70199-2
-123.xml
-456.xml
-789.xml
S0016-5085(68)70200-6
-a12.xml

Etc.

Or, do you just want a list of URI's for documents that match any of the terms?

I would use cts:uris to get the URIs instead of search:search. You can filter 
cts:uris with cts:element-value-query, and you can pass in many values. 100K 
may be too many, depending on your hardware.

Another approach would be to use reverse-query. You can make each of your 
documents an or query of multiple element-value-query queries. Then you can 
pass your document with 100K PI values in using reverse-query and it will 
return matching queries, which in this case map to your documents and their 
URIs. This may be overkill for what you're trying to do.

Kelly


Message: 2
Date: Wed, 20 Jul 2011 13:28:05 +0530
From: Vijayasekar Padmanaban <[email protected]>
Subject: Re: [MarkLogic Dev General] Search using 100k terms
To: General MarkLogic Developer Discussion
        <[email protected]>
Message-ID:
        <66586dccf3922145b01ca0b975f3a5670a32da1...@chnshlmbx03.ad.infosys.com>
        
Content-Type: text/plain; charset="us-ascii"

Hi Jason,

Sorry for the confusion.

Please find below the snippet of the xml we have in DB. (DB is having 10 
million xml documents)

<ja:item-info>
<ja:jid>YMSG</ja:jid>
<ja:aid>0103883</ja:aid>
<ce:pii>S0011-3840(01)70009-3</ce:pii>
<ce:doi>10.1016/S0011-3840(01)70009-3</ce:doi>
<ce:copyright type="other" year="2001"/> </ja:item-info>

The file we used to upload will have the PIIs (which I had mentioned as terms 
in my earlier email) as shown below: (There could be 100k PIIs in the file) 
S0016-5085(68)70198-0
S0016-5085(68)70199-2
S0016-5085(68)70200-6
S0016-5085(68)70201-8
S0016-5085(68)70202-X
S0016-5085(68)70203-1
S0016-5085(68)70204-3
.....
.....

I need to identify documents that matches the PIIs (which I had mentioned as 
terms in my earlier email) in the file.

Currently we are using search:search() API in our application. Hence I had 
tried using the additional query option of search API as shown below:
cts:element-value-query(xs:QName("ce:pii"), $uploadedPIIs as xs:string*)

But this additional query option is taking lot of time to yield result.

So is there any other better way to perform this? Please suggest.

Regards,
Vijay
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to