I think it makes sense to discuss this in more detail here first, and then see
if we can summarize conclusions on SO..
In general there are several ways to get rid of a large group of files. It
generally comes down to either:
1. xdmp:collection-delete and xdmp:directory-delete
2. or a batch delete approach.
This roughly matches the two answers on SO.
The ‘benefit' of approach 1 is that it happens in one transaction, which could
be important to you. But you are right that a collection-delete can take time.
I would not necessarily say it will flood servers, but deleting 20 mln docs
could take up to minutes. How much exactly depends a lot on factors like how
many forests, how fast your disks are, how many MarkLogic instances you have in
your cluster, how the docs are spread across those, etc. Deleting 20 mln docs
could just as well take 10 sec, provided right configuration, and right
circumstances are met. Right circumstances also includes things like not having
triggers, not having enabled auditing etc..
The second approach has kind of the opposite. You won’t have the deletion
happening in one transaction (unless you care to handle transactions yourself),
but you have more control to manage load, and can take as long as needed. There
are several tools that can help spawning deletion tasks. Corb/Corb2 is one,
Taskbot is another.
Which answer fits your case best, depends firstly on whether or not it is
important to do the collection-delete in one transaction. Secondly, the volume
of the average deletion counts, and how often you need to perform it. It might
be good to run a test on a similar environment that allows estimating whether
you can run the delete in an acceptable timeframe.
We could go into more detail about xdmp:collection-delete, but I don’t think
that will be of much help to you.
Instead I’d prefer returning to your description on SO, you are talking about
‘expired’ collection items. Have you considered giving documents an expiry
date, and running a schedule that will periodically remove expired documents?
If the schedule runs for instance every hour, and would delete a reasonable
sized batch of files on average, that could help spread load for keeping your
on behalf of Qambar Raza <qambar.r...@bbc.co.uk<mailto:qambar.r...@bbc.co.uk>>
Reply-To: MarkLogic Developer Discussion
Date: Monday, September 19, 2016 at 1:00 PM
Subject: [MarkLogic Dev General] How does "xdmp:collection-delete" work?
Can anyone answer my question on stack overflow, I couldn't find a
documentation about how https://docs.marklogic.com/xdmp:collection-delete works.
For more details, see :
General mailing list
Manage your subscription at: