Hi,I deleted (or tried to delete) a few hundred TB of data in our S3 storage the last couple of days. Unfortunately, all delete operations seem to hang somewhat.
I tried deleting the buckets with radosgw-admin bucket rm --bucket=bucketname --purge-objects --bypass-gc, but it seems to hang forever without any visible progress. Even when I remove --bypass-gc, it just hangs. I've since managed to clear my buckets by doing an s3 sync with an empty directory, but that of course didn't actually free any space yet. So, I started radosgw-admin gc process --include-all yesterday. This process is still running without much visible progress (we are about 10M objects down, nothing significant).
The processes are not completely dead. They hang in futex calls with 0% CPU utilization. Every now and then, there's a small spike of ca. 0.6% CPU with a bunch of other calls in strace, but most of the time seems to be spent in interruptible sleep.
Is there any way I can see what's actually happening or why it is taking so long? Could this be related to our aging hard disks? Deleting old snapshots in CephFS starts freeing storage almost instantly and the 10M objects are deleted in about half an hour, so that seems like an unlikely cause? We urgently need to free some space to better balance out the data.
I found this ML thread, but there's no solution in it. https://www.mail-archive.com/[email protected]/msg11571.html Upping --max-concurrent-ios didn't help.
Janek
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ ceph-users mailing list -- [email protected] To unsubscribe send an email to [email protected]
