Hi Indy,

You usually don’t need manual merging, MarkLogic will clean up automatically 
provided merging is not disabled. Deleted fragments might stick around for one 
hour, just to make sure merging does not slow down deletion itself..


 on behalf of Indrajeet Verma 
Reply-To: MarkLogic Developer Discussion 
Date: Monday, September 19, 2016 at 5:36 PM
To: MarkLogic Developer Discussion 
Subject: Re: [MarkLogic Dev General] How does "xdmp:collection-delete" work?

Hi Geert,

In addition, do you think he should run manual merge after deleting 20M to free 
the space and improve performance?


On Mon, Sep 19, 2016 at 8:58 PM, Geert Josten 
<geert.jos...@marklogic.com<mailto:geert.jos...@marklogic.com>> wrote:
Hi Qambar,

I think it makes sense to discuss this in more detail here first, and then see 
if we can summarize conclusions on SO..

In general there are several ways to get rid of a large group of files. It 
generally comes down to either:

  1.  xdmp:collection-delete and xdmp:directory-delete
  2.  or a batch delete approach.

This roughly matches the two answers on SO.

The ‘benefit' of approach 1 is that it happens in one transaction, which could 
be important to you. But you are right that a collection-delete can take time. 
I would not necessarily say it will flood servers, but deleting 20 mln docs 
could take up to minutes. How much exactly depends a lot on factors like how 
many forests, how fast your disks are, how many MarkLogic instances you have in 
your cluster, how the docs are spread across those, etc. Deleting 20 mln docs 
could just as well take 10 sec, provided right configuration, and right 
circumstances are met. Right circumstances also includes things like not having 
triggers, not having enabled auditing etc..

The second approach has kind of the opposite. You won’t have the deletion 
happening in one transaction (unless you care to handle transactions yourself), 
but you have more control to manage load, and can take as long as needed. There 
are several tools that can help spawning deletion tasks. Corb/Corb2 is one, 
Taskbot is another.

Which answer fits your case best, depends firstly on whether or not it is 
important to do the collection-delete in one transaction. Secondly, the volume 
of the average deletion counts, and how often you need to perform it. It might 
be good to run a test on a similar environment that allows estimating whether 
you can run the delete in an acceptable timeframe.

We could go into more detail about xdmp:collection-delete, but I don’t think 
that will be of much help to you.

Instead I’d prefer returning to your description on SO, you are talking about 
‘expired’ collection items. Have you considered giving documents an expiry 
date, and running a schedule that will periodically remove expired documents? 
If the schedule runs for instance every hour, and would delete a reasonable 
sized batch of files on average, that could help spread load for keeping your 
system clean..


 on behalf of Qambar Raza <qambar.r...@bbc.co.uk<mailto:qambar.r...@bbc.co.uk>>
Reply-To: MarkLogic Developer Discussion 
Date: Monday, September 19, 2016 at 1:00 PM
To: "general@developer.marklogic.com<mailto:general@developer.marklogic.com>" 
Subject: [MarkLogic Dev General] How does "xdmp:collection-delete" work?


Can anyone  answer my question on stack overflow, I couldn't find a 
documentation about how https://docs.marklogic.com/xdmp:collection-delete works.

For more details, see :



General mailing list
Manage your subscription at:

General mailing list
Manage your subscription at: 

Reply via email to