Thanks all for your suggestions.

However, this a very old application, where we cant change the way we store the 
data.
So, assuming that the system cant be changed , please suggest the best possible 
solution for this problem.

Thanks
Pragya

________________________________________
From: [email protected] 
<[email protected]> on behalf of Ghislain Fourny 
<[email protected]>
Sent: Monday, August 24, 2015 4:43 PM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] distinct values on huge data

Hi,

I second Florent on this. A big collection of small trees is the most
commonly optimized use case in most NoSQL document stores. A single,
big document won't typically scale up and cannot be automatically
distributed across machines.

Kind regards,
Ghislain


On Mon, Aug 24, 2015 at 12:28 PM, Florent Georges <[email protected]> wrote:
>   By the way, I've just noticed in your example that you put all the
> information in one single document.  CTS and indexes are of no help here.  A
> good practice is to put each "record" (I like to use the word "entity", here
> it is each of your "Document" element) in its own document, instead of
> putting them all in one single document, within an artificial *-List element
> (here DocumentList).
>
>   Regards,
>
> --
> Florent Georges
> http://fgeorges.org/
> http://h2oconsulting.be/
>
>
> On 24 August 2015 at 11:21, Florent Georges wrote:
>>
>>   Hi,
>>
>>   It looks to me that what you really want to have, is a list of "active"
>> documents (each document with the same number being considered the same,
>> with only one active at any time).  So you can easily constraint any search
>> only on the active documents.
>>
>>   If this is the case, I would simply maintain them all in the same
>> collection (the collection for active documents).  Every time you ingest a
>> new document, you have to check whether is must be added to the active
>> collection (and if it is the case, whether there was already an active
>> document with the same number, in which case it has to be put out of the
>> collection).
>>
>>   Hope that helps, regards,
>>
>> --
>> Florent Georges
>> http://fgeorges.org/
>> http://h2oconsulting.be/
>>
>>
>> On 24 August 2015 at 11:04, Kapoor, Pragya wrote:
>>>
>>> Hi Geert,
>>>
>>>
>>> I have a docList which has metadata info for each document. So ,I need to
>>> first find the distinct Number nodes which should be ordered by Date
>>> element( descending ), as in docList there could be more than one entry for
>>> a single Number and then return the Document node satisfying the above
>>> criteria.
>>>
>>>
>>> For expamle :
>>>
>>> Number = 0000004
>>>
>>> For this, lets assume there are 3 document entries which has Number=
>>> 0000340
>>>
>>> So I need to pick only the document node with the latest date.
>>>
>>>
>>> docList :
>>>
>>> <DocumentList>
>>>
>>> <Document>
>>>
>>> <DocumentType>VM</DocumentType>
>>>
>>> <ID>/docs/0000002-0000000-0000340-2011-06-08_18-51-29-589.xml</ID>
>>>
>>> <Number>0000340</Number>
>>>
>>> <Date Year="2011" Month="06" Day="08">2011 Jun 08</Date>
>>>
>>> <Hidden/>
>>>
>>> </Document>
>>>
>>> <Document>
>>>
>>> <DocumentType>MA</DocumentType>
>>>
>>> <ID>/docs/0000002-0000000-0000340-2011-06-08_18-51-29-256.xml</ID>
>>>
>>> <Number>0000340</Number>
>>>
>>> <Date Year="2011" Month="07" Day="10">2011 July 10</Date>
>>>
>>> <Hidden/>
>>>
>>> </Document>
>>>
>>> <Document>
>>>
>>> <DocumentType>AM</DocumentType>
>>>
>>> <ID>/docs/0000002-0000000-0000340-2011-06-08_18-51-29-592.xml</ID>
>>>
>>> <Number>0000340</Number>
>>>
>>> <Date Year="2015" Month="06" Day="15">2015 Jun 15</Date>
>>>
>>> <Hidden/>
>>>
>>> </Document>
>>>
>>> </DocumentList>
>>>
>>>
>>>
>>> Thanks
>>>
>>> Pragya
>>>
>>>
>>>
>>> ________________________________
>>> From: [email protected]
>>> <[email protected]> on behalf of Geert Josten
>>> <[email protected]>
>>> Sent: Monday, August 24, 2015 2:14 PM
>>> To: MarkLogic Developer Discussion
>>> Subject: Re: [MarkLogic Dev General] distinct values on huge data
>>>
>>> Hi Pragya,
>>>
>>> Could you tell first in a bit more detail what question you are trying to
>>> answer?
>>>
>>> Cheers,
>>> Geert
>>>
>>> From: <[email protected]> on behalf of "Kapoor,
>>> Pragya" <[email protected]>
>>> Reply-To: MarkLogic Developer Discussion
>>> <[email protected]>
>>> Date: Monday, August 24, 2015 at 9:07 AM
>>> To: MarkLogic Developer Discussion <[email protected]>
>>> Subject: [MarkLogic Dev General] distinct values on huge data
>>>
>>> Hi,
>>>
>>>
>>> I want to the run below code on 50 lacs entries in DocList.xml:
>>>
>>>
>>>   let $docList :=
>>>
>>>         functx:distinct-deep(
>>>
>>>
>>> cts:search(fn:doc("/misc/DocList.xml")/DocumentList/Document/Number,
>>> cts:and-query(()))
>>>
>>>         )
>>>
>>> for $each in $docList
>>>
>>> order by $each/../Date descending
>>>
>>> return $each/..
>>>
>>>
>>> This is code is giving error on huge data sets. I have already created a
>>> range index on Date element
>>>
>>>
>>> Please suggest.
>>>
>>>
>>> Thanks
>>>
>>> Pragya
>>>
>>> "This e-mail and any attachments transmitted with it are for the sole use
>>> of the intended recipient(s) and may contain confidential , proprietary or
>>> privileged information. If you are not the intended recipient, please
>>> contact the sender by reply e-mail and destroy all copies of the original
>>> message. Any unauthorized review, use, disclosure, dissemination,
>>> forwarding, printing or copying of this e-mail or any action taken in
>>> reliance on this e-mail is strictly prohibited and may be unlawful."
>>> "This e-mail and any attachments transmitted with it are for the sole use
>>> of the intended recipient(s) and may contain confidential , proprietary or
>>> privileged information. If you are not the intended recipient, please
>>> contact the sender by reply e-mail and destroy all copies of the original
>>> message. Any unauthorized review, use, disclosure, dissemination,
>>> forwarding, printing or copying of this e-mail or any action taken in
>>> reliance on this e-mail is strictly prohibited and may be unlawful."
>>>
>>> _______________________________________________
>>> General mailing list
>>> [email protected]
>>> Manage your subscription at:
>>> http://developer.marklogic.com/mailman/listinfo/general
>>>
>>
>>
>>
>
>
>
>
> _______________________________________________
> General mailing list
> [email protected]
> Manage your subscription at:
> http://developer.marklogic.com/mailman/listinfo/general
>
_______________________________________________
General mailing list
[email protected]
Manage your subscription at:
http://developer.marklogic.com/mailman/listinfo/general
"This e-mail and any attachments transmitted with it are for the sole use of 
the intended recipient(s) and may contain confidential , proprietary or 
privileged information. If you are not the intended recipient, please contact 
the sender by reply e-mail and destroy all copies of the original message. Any 
unauthorized review, use, disclosure, dissemination, forwarding, printing or 
copying of this e-mail or any action taken in reliance on this e-mail is 
strictly prohibited and may be unlawful."
_______________________________________________
General mailing list
[email protected]
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to