Hi Geert,

In addition, do you think he should run manual merge after deleting 20M to
free the space and improve performance?


On Mon, Sep 19, 2016 at 8:58 PM, Geert Josten <geert.jos...@marklogic.com>

> Hi Qambar,
> I think it makes sense to discuss this in more detail here first, and then
> see if we can summarize conclusions on SO..
> In general there are several ways to get rid of a large group of files. It
> generally comes down to either:
>    1. xdmp:collection-delete and xdmp:directory-delete
>    2. or a batch delete approach.
> This roughly matches the two answers on SO.
> The ‘benefit' of approach 1 is that it happens in one transaction, which
> could be important to you. But you are right that a collection-delete can
> take time. I would not necessarily say it will flood servers, but deleting
> 20 mln docs could take up to minutes. How much exactly depends a lot on
> factors like how many forests, how fast your disks are, how many MarkLogic
> instances you have in your cluster, how the docs are spread across those,
> etc. Deleting 20 mln docs could just as well take 10 sec, provided right
> configuration, and right circumstances are met. Right circumstances also
> includes things like not having triggers, not having enabled auditing etc..
> The second approach has kind of the opposite. You won’t have the deletion
> happening in one transaction (unless you care to handle transactions
> yourself), but you have more control to manage load, and can take as long
> as needed. There are several tools that can help spawning deletion tasks.
> Corb/Corb2 is one, Taskbot is another.
> Which answer fits your case best, depends firstly on whether or not it is
> important to do the collection-delete in one transaction. Secondly, the
> volume of the average deletion counts, and how often you need to perform
> it. It might be good to run a test on a similar environment that allows
> estimating whether you can run the delete in an acceptable timeframe.
> We could go into more detail about xdmp:collection-delete, but I don’t
> think that will be of much help to you.
> Instead I’d prefer returning to your description on SO, you are talking
> about ‘expired’ collection items. Have you considered giving documents an
> expiry date, and running a schedule that will periodically remove expired
> documents? If the schedule runs for instance every hour, and would delete a
> reasonable sized batch of files on average, that could help spread load for
> keeping your system clean..
> Cheers,
> Geert
> From: <general-boun...@developer.marklogic.com> on behalf of Qambar Raza <
> qambar.r...@bbc.co.uk>
> Reply-To: MarkLogic Developer Discussion <general@developer.marklogic.com>
> Date: Monday, September 19, 2016 at 1:00 PM
> To: "general@developer.marklogic.com" <general@developer.marklogic.com>
> Subject: [MarkLogic Dev General] How does "xdmp:collection-delete" work?
> Hello,
> Can anyone  answer my question on stack overflow, I couldn't find a
> documentation about how https://docs.marklogic.com/xdmp:collection-delete
> works.
> For more details, see :
> http://stackoverflow.com/questions/39571215/how-does-
> marklogics-xdmpcollection-delete-work
> Thanks,
> Qambar.
> _______________________________________________
> General mailing list
> General@developer.marklogic.com
> Manage your subscription at:
> http://developer.marklogic.com/mailman/listinfo/general
General mailing list
Manage your subscription at: 

Reply via email to