So, here's how we solved the problem of grouping documents in real time:

1. Use a reference table document to store URIs of member documents.
There's one document per group.
2. Updating the members of the group in reference table document becomes
trivial with node-insert and node-delete.
3. Listing member documents with cts:document-query. We can append more
query options by cts:and-query
4. Deleting the group altogether becomes 1 operation: delete document that
contains reference table.

Let me know what you guys think.

-Ankur


On 9/11/12 9:21 AM, "Harsh Setia" <setia.ha...@gmail.com> wrote:

>Hi, 
>
>I was trying to do the same thing few weeks back. I wanted to delete a
>huge number of documents which satisfied some criteria. what i tried
>doing was 
>a. update all the documents which satisfy the conditions to add a new
>collection (say /tobedeleted/)
>b. delete collection
>
>I tried doing step a with CORB but I found that it was not very
>performant. Then instead of doing 2 steps, I started using
>xdmp:document-delete (using CORB again) as soon as I got all the eligible
>documents.
>
>So, the message is, updating collection on a document will make a write
>of the updated doc to the disk, and create a new revision. So, if you are
>just updating collections because you want to group/delete/relate the
>documents, then there are other better ways to do that.
>
>Sent from my iPad
>
>On Sep 11, 2012, at 14:43, Ankur Patwa <ankur.pa...@icainformatics.com>
>wrote:
>
>> Thanks Damon and Harry!
>> I'll try Corb and multiple transactions.
>> 
>> Best,
>> Ankur
>> 
>> On 9/10/12 10:34 PM, "Damon Feldman" <damon.feld...@marklogic.com>
>>wrote:
>> 
>>> Ankur,
>>> 
>>> Modifying collections on a document is just like modifying the XML (in
>>> fact a collection is a lot like an invisible XML element in the
>>>document)
>>> - it causes a rewrite of the document itself to disk. So the real
>>> question is how to efficiently update a large number of existing
>>> documents. There are many ways to do this - one is to use CoRB, which
>>>is
>>> multi-threaded. Another is to update the documents in batches of about
>>> 100 per transaction (on per transaction or tens of thousands per
>>> transaction will definitely be less efficient).
>>> 
>>> Note that despite the name, it is usually better to think of a
>>>collection
>>> as a "tag" on a document rather than a container that documents are
>>> inside of. Collections do not exist separately from the documents and
>>>it
>>> is document updates that cause collections to come in and out of
>>> existence.
>>> 
>>> Yours,
>>> Damon
>>> 
>>> -----Original Message-----
>>> From: general-boun...@developer.marklogic.com
>>> [mailto:general-boun...@developer.marklogic.com] On Behalf Of Ankur
>>>Patwa
>>> Sent: Monday, September 10, 2012 11:20 PM
>>> To: general@developer.marklogic.com
>>> Subject: [MarkLogic Dev General] Collections and documents
>>> 
>>> All,
>>> I want to "tag" documents with collections.
>>> 
>>> My first problem is that when I try to add a collection for a large
>>> amount of documents, it takes about 9 mins. Is there a faster way to
>>>add
>>> documents to collections? Please note that the documents are already in
>>> the database.
>>> 
>>> Now let's say I have multiple collections on different documents. Is
>>> there a smarter and faster way to "detach a document from a collection"
>>> i.e. remove a specific collection but not delete the documents?
>>> 
>>> Thanks in advance!
>>> Sincerely,
>>> Ankur 
>>> 
>>> NOTICE OF CONFIDENTIALITY: This electronic message, including
>>> attachments, is for the sole use of the named recipient and may contain
>>> confidential or privileged information protected by State of Tennessee
>>> and Federal regulations.  Any unauthorized review, use, disclosure,
>>> copying or distribution is strictly prohibited.  If you are not the
>>> intended recipient or have received this communication in error please
>>> contact the sender or email i...@icainformatics.com and destroy all
>>> copies of the original message. Thank you.
>>> _______________________________________________
>>> General mailing list
>>> General@developer.marklogic.com
>>> http://developer.marklogic.com/mailman/listinfo/general
>>> _______________________________________________
>>> General mailing list
>>> General@developer.marklogic.com
>>> http://developer.marklogic.com/mailman/listinfo/general
>> 
>> 
>> 
>> NOTICE OF CONFIDENTIALITY: This electronic message, including
>>attachments, is for the sole use of the named recipient and may contain
>>confidential or privileged information protected by State of Tennessee
>>and Federal regulations.  Any unauthorized review, use, disclosure,
>>copying or distribution is strictly prohibited.  If you are not the
>>intended recipient or have received this communication in error please
>>contact the sender or email i...@icainformatics.com and destroy all
>>copies of the original message. Thank you.
>> _______________________________________________
>> General mailing list
>> General@developer.marklogic.com
>> http://developer.marklogic.com/mailman/listinfo/general
>_______________________________________________
>General mailing list
>General@developer.marklogic.com
>http://developer.marklogic.com/mailman/listinfo/general



NOTICE OF CONFIDENTIALITY: This electronic message, including attachments, is 
for the sole use of the named recipient and may contain confidential or 
privileged information protected by State of Tennessee and Federal regulations. 
 Any unauthorized review, use, disclosure, copying or distribution is strictly 
prohibited.  If you are not the intended recipient or have received this 
communication in error please contact the sender or email 
i...@icainformatics.com and destroy all copies of the original message. Thank 
you.
_______________________________________________
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to