You say "the term" but you also say you have 300,000 terms.  So I'm confused.

You want to find documents that have all 300,000 terms?

Or for each term you want to find documents having just that term?  And you 
want to do that basic query 300,000 times across all terms in less than 10 
seconds?

-jh-

On Jul 19, 2011, at 11:13 PM, Vijayasekar Padmanaban wrote:

> Hi Jason,
>  
> Thanks for your response.
>  
> My DB is having 10 million documents in it. I need to identify the documents 
> which have the term.
> I would expect search to retrieve results less than 10 seconds.
>  
> Regards,
> Vijay
>  
> From: [email protected] 
> [mailto:[email protected]] On Behalf Of Jason Hunter
> Sent: Wednesday, July 20, 2011 11:33 AM
> To: General MarkLogic Developer Discussion
> Subject: Re: [MarkLogic Dev General] Search using 100k terms
>  
> I'm a little unclear on what you're trying to do.
>  
> You want to take a list of 300,000 terms and identify which documents have 
> each term?  Or do you only need to identify which terms are present in one or 
> more documents and which terms aren't present anywhere?  Something else?
>  
> How long are you willing to wait for the answer?
>  
> -jh-
>  
> On Jul 19, 2011, at 10:45 PM, Vijayasekar Padmanaban wrote:
> 
> 
> Hi All,
>  
> We have a use case to perform search based on the contents uploaded as a 
> file. The file would have a max of 100,000 terms in it. We need to validate 
> the contents of the file with our repository contents and produce results. 
> Our repository contains 10 million contents. Each term in the file need to be 
> validated with an element in the enhanced xml.
>  
> Below are the two approached I had tried:
> 1.       Using search constraints
> a.       Each search term would be concatenated with the constraint and would 
> be joined using ‘OR’ delimiter as shown below:
> For e.g., “const:<term1> OR const:<term2> OR const:<term3> OR const:<term3> 
> OR …..”
>                                 This ended in stack overflow error when the 
> number of search terms exceeded 1000
> 2.       Using element value query
> a.       All the search terms would be passed as text to the 
> cts:element-value-query as shown below:
> cts:element-value-query(<Qualifier-Name>, text as xs:string*)
>                                 This worked well when DB contains less number 
> of contents say 300,000. But when used with DB that has 10 million contents 
> it failed saying “Time limit exceeded”
>  
> Could you suggest me the best possible approach to resolve this issue?
>  
> Thanks,
> Vijay
>  
> **************** CAUTION - Disclaimer *****************
> This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely 
> for the use of the addressee(s). If you are not the intended recipient, 
> please 
> notify the sender by e-mail and delete the original message. Further, you are 
> not 
> to copy, disclose, or distribute this e-mail or its contents to any other 
> person and 
> any such actions are unlawful. This e-mail may contain viruses. Infosys has 
> taken 
> every reasonable precaution to minimize this risk, but is not liable for any 
> damage 
> you may sustain as a result of any virus in this e-mail. You should carry out 
> your 
> own virus checks before opening the e-mail or attachment. Infosys reserves 
> the 
> right to monitor and review the content of all messages sent to or from this 
> e-mail 
> address. Messages sent to or from this e-mail address may be stored on the 
> Infosys e-mail system.
> ***INFOSYS******** End of Disclaimer ********INFOSYS***
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
>  
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to