Hi Geert, In addition, do you think he should run manual merge after deleting 20M to free the space and improve performance?
Regards, Indy On Mon, Sep 19, 2016 at 8:58 PM, Geert Josten <geert.jos...@marklogic.com> wrote: > Hi Qambar, > > I think it makes sense to discuss this in more detail here first, and then > see if we can summarize conclusions on SO.. > > In general there are several ways to get rid of a large group of files. It > generally comes down to either: > > 1. xdmp:collection-delete and xdmp:directory-delete > 2. or a batch delete approach. > > This roughly matches the two answers on SO. > > The ‘benefit' of approach 1 is that it happens in one transaction, which > could be important to you. But you are right that a collection-delete can > take time. I would not necessarily say it will flood servers, but deleting > 20 mln docs could take up to minutes. How much exactly depends a lot on > factors like how many forests, how fast your disks are, how many MarkLogic > instances you have in your cluster, how the docs are spread across those, > etc. Deleting 20 mln docs could just as well take 10 sec, provided right > configuration, and right circumstances are met. Right circumstances also > includes things like not having triggers, not having enabled auditing etc.. > > The second approach has kind of the opposite. You won’t have the deletion > happening in one transaction (unless you care to handle transactions > yourself), but you have more control to manage load, and can take as long > as needed. There are several tools that can help spawning deletion tasks. > Corb/Corb2 is one, Taskbot is another. > > Which answer fits your case best, depends firstly on whether or not it is > important to do the collection-delete in one transaction. Secondly, the > volume of the average deletion counts, and how often you need to perform > it. It might be good to run a test on a similar environment that allows > estimating whether you can run the delete in an acceptable timeframe. > > We could go into more detail about xdmp:collection-delete, but I don’t > think that will be of much help to you. > > Instead I’d prefer returning to your description on SO, you are talking > about ‘expired’ collection items. Have you considered giving documents an > expiry date, and running a schedule that will periodically remove expired > documents? If the schedule runs for instance every hour, and would delete a > reasonable sized batch of files on average, that could help spread load for > keeping your system clean.. > > Cheers, > Geert > > From: <general-boun...@developer.marklogic.com> on behalf of Qambar Raza < > qambar.r...@bbc.co.uk> > Reply-To: MarkLogic Developer Discussion <general@developer.marklogic.com> > Date: Monday, September 19, 2016 at 1:00 PM > To: "general@developer.marklogic.com" <general@developer.marklogic.com> > Subject: [MarkLogic Dev General] How does "xdmp:collection-delete" work? > > Hello, > > Can anyone answer my question on stack overflow, I couldn't find a > documentation about how https://docs.marklogic.com/xdmp:collection-delete > works. > > For more details, see : > http://stackoverflow.com/questions/39571215/how-does- > marklogics-xdmpcollection-delete-work > > Thanks, > > Qambar. > > _______________________________________________ > General mailing list > General@developer.marklogic.com > Manage your subscription at: > http://developer.marklogic.com/mailman/listinfo/general > >
_______________________________________________ General mailing list General@developer.marklogic.com Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general