Thanks all for your suggestions. However, this a very old application, where we cant change the way we store the data. So, assuming that the system cant be changed , please suggest the best possible solution for this problem.
Thanks Pragya ________________________________________ From: [email protected] <[email protected]> on behalf of Ghislain Fourny <[email protected]> Sent: Monday, August 24, 2015 4:43 PM To: MarkLogic Developer Discussion Subject: Re: [MarkLogic Dev General] distinct values on huge data Hi, I second Florent on this. A big collection of small trees is the most commonly optimized use case in most NoSQL document stores. A single, big document won't typically scale up and cannot be automatically distributed across machines. Kind regards, Ghislain On Mon, Aug 24, 2015 at 12:28 PM, Florent Georges <[email protected]> wrote: > By the way, I've just noticed in your example that you put all the > information in one single document. CTS and indexes are of no help here. A > good practice is to put each "record" (I like to use the word "entity", here > it is each of your "Document" element) in its own document, instead of > putting them all in one single document, within an artificial *-List element > (here DocumentList). > > Regards, > > -- > Florent Georges > http://fgeorges.org/ > http://h2oconsulting.be/ > > > On 24 August 2015 at 11:21, Florent Georges wrote: >> >> Hi, >> >> It looks to me that what you really want to have, is a list of "active" >> documents (each document with the same number being considered the same, >> with only one active at any time). So you can easily constraint any search >> only on the active documents. >> >> If this is the case, I would simply maintain them all in the same >> collection (the collection for active documents). Every time you ingest a >> new document, you have to check whether is must be added to the active >> collection (and if it is the case, whether there was already an active >> document with the same number, in which case it has to be put out of the >> collection). >> >> Hope that helps, regards, >> >> -- >> Florent Georges >> http://fgeorges.org/ >> http://h2oconsulting.be/ >> >> >> On 24 August 2015 at 11:04, Kapoor, Pragya wrote: >>> >>> Hi Geert, >>> >>> >>> I have a docList which has metadata info for each document. So ,I need to >>> first find the distinct Number nodes which should be ordered by Date >>> element( descending ), as in docList there could be more than one entry for >>> a single Number and then return the Document node satisfying the above >>> criteria. >>> >>> >>> For expamle : >>> >>> Number = 0000004 >>> >>> For this, lets assume there are 3 document entries which has Number= >>> 0000340 >>> >>> So I need to pick only the document node with the latest date. >>> >>> >>> docList : >>> >>> <DocumentList> >>> >>> <Document> >>> >>> <DocumentType>VM</DocumentType> >>> >>> <ID>/docs/0000002-0000000-0000340-2011-06-08_18-51-29-589.xml</ID> >>> >>> <Number>0000340</Number> >>> >>> <Date Year="2011" Month="06" Day="08">2011 Jun 08</Date> >>> >>> <Hidden/> >>> >>> </Document> >>> >>> <Document> >>> >>> <DocumentType>MA</DocumentType> >>> >>> <ID>/docs/0000002-0000000-0000340-2011-06-08_18-51-29-256.xml</ID> >>> >>> <Number>0000340</Number> >>> >>> <Date Year="2011" Month="07" Day="10">2011 July 10</Date> >>> >>> <Hidden/> >>> >>> </Document> >>> >>> <Document> >>> >>> <DocumentType>AM</DocumentType> >>> >>> <ID>/docs/0000002-0000000-0000340-2011-06-08_18-51-29-592.xml</ID> >>> >>> <Number>0000340</Number> >>> >>> <Date Year="2015" Month="06" Day="15">2015 Jun 15</Date> >>> >>> <Hidden/> >>> >>> </Document> >>> >>> </DocumentList> >>> >>> >>> >>> Thanks >>> >>> Pragya >>> >>> >>> >>> ________________________________ >>> From: [email protected] >>> <[email protected]> on behalf of Geert Josten >>> <[email protected]> >>> Sent: Monday, August 24, 2015 2:14 PM >>> To: MarkLogic Developer Discussion >>> Subject: Re: [MarkLogic Dev General] distinct values on huge data >>> >>> Hi Pragya, >>> >>> Could you tell first in a bit more detail what question you are trying to >>> answer? >>> >>> Cheers, >>> Geert >>> >>> From: <[email protected]> on behalf of "Kapoor, >>> Pragya" <[email protected]> >>> Reply-To: MarkLogic Developer Discussion >>> <[email protected]> >>> Date: Monday, August 24, 2015 at 9:07 AM >>> To: MarkLogic Developer Discussion <[email protected]> >>> Subject: [MarkLogic Dev General] distinct values on huge data >>> >>> Hi, >>> >>> >>> I want to the run below code on 50 lacs entries in DocList.xml: >>> >>> >>> let $docList := >>> >>> functx:distinct-deep( >>> >>> >>> cts:search(fn:doc("/misc/DocList.xml")/DocumentList/Document/Number, >>> cts:and-query(())) >>> >>> ) >>> >>> for $each in $docList >>> >>> order by $each/../Date descending >>> >>> return $each/.. >>> >>> >>> This is code is giving error on huge data sets. I have already created a >>> range index on Date element >>> >>> >>> Please suggest. >>> >>> >>> Thanks >>> >>> Pragya >>> >>> "This e-mail and any attachments transmitted with it are for the sole use >>> of the intended recipient(s) and may contain confidential , proprietary or >>> privileged information. If you are not the intended recipient, please >>> contact the sender by reply e-mail and destroy all copies of the original >>> message. Any unauthorized review, use, disclosure, dissemination, >>> forwarding, printing or copying of this e-mail or any action taken in >>> reliance on this e-mail is strictly prohibited and may be unlawful." >>> "This e-mail and any attachments transmitted with it are for the sole use >>> of the intended recipient(s) and may contain confidential , proprietary or >>> privileged information. If you are not the intended recipient, please >>> contact the sender by reply e-mail and destroy all copies of the original >>> message. Any unauthorized review, use, disclosure, dissemination, >>> forwarding, printing or copying of this e-mail or any action taken in >>> reliance on this e-mail is strictly prohibited and may be unlawful." >>> >>> _______________________________________________ >>> General mailing list >>> [email protected] >>> Manage your subscription at: >>> http://developer.marklogic.com/mailman/listinfo/general >>> >> >> >> > > > > > _______________________________________________ > General mailing list > [email protected] > Manage your subscription at: > http://developer.marklogic.com/mailman/listinfo/general > _______________________________________________ General mailing list [email protected] Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general "This e-mail and any attachments transmitted with it are for the sole use of the intended recipient(s) and may contain confidential , proprietary or privileged information. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this e-mail or any action taken in reliance on this e-mail is strictly prohibited and may be unlawful." _______________________________________________ General mailing list [email protected] Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general
