I would also make sure that on the options for cts:element-value-query
that you have indicated 'punctuation-insensitive' and 'case-insensitive'.
I'm assuming that punctuation/case will not matter in your situation.

On 7/20/11 9:54 AM, "McBeath, Darin W (ELS-STL)" <[email protected]>
wrote:

>A couple of thoughts Š
>
>Consider using cts:uris (assuming you have a lexicon URI index for your
>content).  This is a lower-level API than search:search and could get you
>better performance.  My guess is that search:search is likley using
>cts:search under the covers.  I don't know for sure as I typically user
>the lower level APIs (such as cts:search, cts:uris, etc.).  Those more
>familiar with search:search can elaborate on whether cts:search is being
>used by search:search.
>
>Continue to use cts:element-value-query (but I would consider breaking
>the list of 100,000 terms into chunks of 10,000 or something a bit more
>reasonable.  For these smaller chunks of work, I would consider spawning
>them on the task server so that they could potentially be done in
>parallel.  Of course, try 100,000 first and see if you can meet your
>performance criteria (< 10s).
>
>One last thought is that you might want to investigate creating a range
>index on ce:pii and use cts:element-range-query.  Not sure if this will
>be faster than cts:element-value-query Š But, I seem to recall  that
>range indexes are supposed to be kept in memory.  This is a fairly
>heavyweight solution as there could be implications on your DB sizing and
>your XML as the ce:pii element would need to be unique within your XML
>document (which is likely not the case) and I wouldn't recommend fields
>in this situation as a workaround.
>
>Darin.
>
>
>
>From: Vijayasekar Padmanaban
><[email protected]<mailto:[email protected]>>
>Reply-To: General MarkLogic Developer Discussion
><[email protected]<mailto:[email protected]>>
>Date: Wed, 20 Jul 2011 13:28:05 +0530
>To: General MarkLogic Developer Discussion
><[email protected]<mailto:[email protected]>>
>Subject: Re: [MarkLogic Dev General] Search using 100k terms
>
>Hi Jason,
>
>Sorry for the confusion.
>
>Please find below the snippet of the xml we have in DB. (DB is having 10
>million xml documents)
>
><ja:item-info>
><ja:jid>YMSG</ja:jid>
><ja:aid>0103883</ja:aid>
><ce:pii>S0011-3840(01)70009-3</ce:pii>
><ce:doi>10.1016/S0011-3840(01)70009-3</ce:doi>
><ce:copyright type="other" year="2001"/>
></ja:item-info>
>
>The file we used to upload will have the PIIs (which I had mentioned as
>terms in my earlier email) as shown below: (There could be 100k PIIs in
>the file)
>S0016-5085(68)70198-0
>S0016-5085(68)70199-2
>S0016-5085(68)70200-6
>S0016-5085(68)70201-8
>S0016-5085(68)70202-X
>S0016-5085(68)70203-1
>S0016-5085(68)70204-3
>Š..
>..Š
>
>I need to identify documents that matches the PIIs (which I had mentioned
>as terms in my earlier email) in the file.
>
>Currently we are using search:search() API in our application. Hence I
>had tried using the additional query option of search API as shown below:
>cts:element-value-query(xs:QName(³ce:pii²), $uploadedPIIs as xs:string*)
>
>But this additional query option is taking lot of time to yield result.
>
>So is there any other better way to perform this? Please suggest.
>
>Regards,
>Vijay
>
>From: 
>[email protected]<mailto:[email protected]
>arklogic.com> [mailto:[email protected]] On Behalf
>Of Jason Hunter
>Sent: Wednesday, July 20, 2011 12:32 PM
>To: General MarkLogic Developer Discussion
>Subject: Re: [MarkLogic Dev General] Search using 100k terms
>
>You say "the term" but you also say you have 300,000 terms.  So I'm
>confused.
>
>You want to find documents that have all 300,000 terms?
>
>Or for each term you want to find documents having just that term?  And
>you want to do that basic query 300,000 times across all terms in less
>than 10 seconds?
>
>-jh-
>
>On Jul 19, 2011, at 11:13 PM, Vijayasekar Padmanaban wrote:
>
>
>Hi Jason,
>
>Thanks for your response.
>
>My DB is having 10 million documents in it. I need to identify the
>documents which have the term.
>I would expect search to retrieve results less than 10 seconds.
>
>Regards,
>Vijay
>
>From: 
>[email protected]<mailto:[email protected]
>arklogic.com> [mailto:[email protected]] On Behalf
>Of Jason Hunter
>Sent: Wednesday, July 20, 2011 11:33 AM
>To: General MarkLogic Developer Discussion
>Subject: Re: [MarkLogic Dev General] Search using 100k terms
>
>I'm a little unclear on what you're trying to do.
>
>You want to take a list of 300,000 terms and identify which documents
>have each term?  Or do you only need to identify which terms are present
>in one or more documents and which terms aren't present anywhere?
>Something else?
>
>How long are you willing to wait for the answer?
>
>-jh-
>
>On Jul 19, 2011, at 10:45 PM, Vijayasekar Padmanaban wrote:
>
>
>
>Hi All,
>
>We have a use case to perform search based on the contents uploaded as a
>file. The file would have a max of 100,000 terms in it. We need to
>validate the contents of the file with our repository contents and
>produce results. Our repository contains 10 million contents. Each term
>in the file need to be validated with an element in the enhanced xml.
>
>Below are the two approached I had tried:
>1.       Using search constraints
>a.       Each search term would be concatenated with the constraint and
>would be joined using ŒOR¹ delimiter as shown below:
>For e.g., ³const:<term1> OR const:<term2> OR const:<term3> OR
>const:<term3> OR Š..²
>                                This ended in stack overflow error when
>the number of search terms exceeded 1000
>2.       Using element value query
>a.       All the search terms would be passed as text to the
>cts:element-value-query as shown below:
>cts:element-value-query(<Qualifier-Name>, text as xs:string*)
>                                This worked well when DB contains less
>number of contents say 300,000. But when used with DB that has 10 million
>contents it failed saying ³Time limit exceeded²
>
>Could you suggest me the best possible approach to resolve this issue?
>
>Thanks,
>Vijay
>
>
>**************** CAUTION - Disclaimer *****************
>
>This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended
>solely
>
>for the use of the addressee(s). If you are not the intended recipient,
>please
>
>notify the sender by e-mail and delete the original message. Further, you
>are not
>
>to copy, disclose, or distribute this e-mail or its contents to any other
>person and
>
>any such actions are unlawful. This e-mail may contain viruses. Infosys
>has taken
>
>every reasonable precaution to minimize this risk, but is not liable for
>any damage
>
>you may sustain as a result of any virus in this e-mail. You should carry
>out your
>
>own virus checks before opening the e-mail or attachment. Infosys
>reserves the
>
>right to monitor and review the content of all messages sent to or from
>this e-mail
>
>address. Messages sent to or from this e-mail address may be stored on the
>
>Infosys e-mail system.
>
>***INFOSYS******** End of Disclaimer ********INFOSYS***
>
>_______________________________________________
>General mailing list
>[email protected]<mailto:[email protected]>
>http://developer.marklogic.com/mailman/listinfo/general
>
>_______________________________________________
>General mailing list
>[email protected]<mailto:[email protected]>
>http://developer.marklogic.com/mailman/listinfo/general
>
>_______________________________________________ General mailing list
>[email protected]<mailto:[email protected]>
>http://developer.marklogic.com/mailman/listinfo/general
>_______________________________________________
>General mailing list
>[email protected]
>http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to