On 11/29/18 6:58 PM, Bryan Stillwell wrote:
> Wido,
>
> I've been looking into this large omap objects problem on a couple of our
> clusters today and came across your script during my research.
>
> The script has been running for a few hours now and I'm already over 100,000
> 'orphaned' objects!
>
> It appears that ever since upgrading to Luminous (12.2.5 initially, followed
> by 12.2.8) this cluster has been resharding the large bucket indexes at least
> once a day and not cleaning up the previous bucket indexes:
>
> for instance in $(radosgw-admin metadata list bucket.instance | jq -r '.[]' |
> grep go-test-dashboard); do
> mtime=$(radosgw-admin metadata get bucket.instance:${instance} | grep mtime)
> num_shards=$(radosgw-admin metadata get bucket.instance:${instance} | grep
> num_shards)
> echo "${instance}: ${mtime} ${num_shards}"
> done | column -t | sort -k3
> go-test-dashboard:default.188839135.327804: "mtime": "2018-06-01
> 22:35:28.693095Z", "num_shards": 0,
> go-test-dashboard:default.617828918.2898: "mtime": "2018-06-02
> 22:35:40.438738Z", "num_shards": 46,
> go-test-dashboard:default.617828918.4: "mtime": "2018-06-02
> 22:38:21.537259Z", "num_shards": 46,
> go-test-dashboard:default.617663016.10499: "mtime": "2018-06-03
> 23:00:04.185285Z", "num_shards": 46,
> [...snip...]
> go-test-dashboard:default.891941432.342061: "mtime": "2018-11-28
> 01:41:46.777968Z", "num_shards": 7,
> go-test-dashboard:default.928133068.2899: "mtime": "2018-11-28
> 20:01:49.390237Z", "num_shards": 46,
> go-test-dashboard:default.928133068.5115: "mtime": "2018-11-29
> 01:54:17.788355Z", "num_shards": 7,
> go-test-dashboard:default.928133068.8054: "mtime": "2018-11-29
> 20:21:53.733824Z", "num_shards": 7,
> go-test-dashboard:default.891941432.359004: "mtime": "2018-11-29
> 20:22:09.201965Z", "num_shards": 46,
>
> The num_shards is typically around 46, but looking at all 288 instances of
> that bucket index, it has varied between 3 and 62 shards.
>
> Have you figured anything more out about this since you posted this
> originally two weeks ago?
>
> Thanks,
> Bryan
>
> From: ceph-users <[email protected]> on behalf of Wido den
> Hollander <[email protected]>
> Date: Thursday, November 15, 2018 at 5:43 AM
> To: Ceph Users <[email protected]>
> Subject: [ceph-users] Removing orphaned radosgw bucket indexes from pool
>
> Hi,
>
> Recently we've seen multiple messages on the mailinglists about people
> seeing HEALTH_WARN due to large OMAP objects on their cluster. This is
> due to the fact that starting with 12.2.6 OSDs warn about this.
>
> I've got multiple people asking me the same questions and I've done some
> digging around.
>
> Somebody on the ML wrote this script:
>
> for bucket in `radosgw-admin metadata list bucket | jq -r '.[]' | sort`; do
> actual_id=`radosgw-admin bucket stats --bucket=${bucket} | jq -r '.id'`
> for instance in `radosgw-admin metadata list bucket.instance | jq -r
> '.[]' | grep ${bucket}: | cut -d ':' -f 2`
> do
> if [ "$actual_id" != "$instance" ]
> then
> radosgw-admin bi purge --bucket=${bucket} --bucket-id=${instance}
> radosgw-admin metadata rm bucket.instance:${bucket}:${instance}
> fi
> done
> done
>
> That partially works, but 'orphaned' objects in the index pool do not work.
>
> So I wrote my own script [0]:
>
> #!/bin/bash
> INDEX_POOL=$1
>
> if [ -z "$INDEX_POOL" ]; then
> echo "Usage: $0 <index pool>"
> exit 1
> fi
>
> INDEXES=$(mktemp)
> METADATA=$(mktemp)
>
> trap "rm -f ${INDEXES} ${METADATA}" EXIT
>
> radosgw-admin metadata list bucket.instance|jq -r '.[]' > ${METADATA}
> rados -p ${INDEX_POOL} ls > $INDEXES
>
> for OBJECT in $(cat ${INDEXES}); do
> MARKER=$(echo ${OBJECT}|cut -d '.' -f 3,4,5)
> grep ${MARKER} ${METADATA} > /dev/null
> if [ "$?" -ne 0 ]; then
> echo $OBJECT
> fi
> done
>
> It does not remove anything, but for example, it returns these objects:
>
> .dir.eb32b1ca-807a-4867-aea5-ff43ef7647c6.10406917.5752
> .dir.eb32b1ca-807a-4867-aea5-ff43ef7647c6.10289105.6162
> .dir.eb32b1ca-807a-4867-aea5-ff43ef7647c6.10289105.6186
>
> The output of:
>
> $ radosgw-admin metadata list|jq -r '.[]'
>
> Does not contain:
> - eb32b1ca-807a-4867-aea5-ff43ef7647c6.10406917.5752
> - eb32b1ca-807a-4867-aea5-ff43ef7647c6.10289105.6162
> - eb32b1ca-807a-4867-aea5-ff43ef7647c6.10289105.6186
>
> So for me these objects do not seem to be tied to any bucket and seem to
> be leftovers which were not cleaned up.
>
> For example, I see these objects tied to a bucket:
>
> - b32b1ca-807a-4867-aea5-ff43ef7647c6.10289105.6160
> - eb32b1ca-807a-4867-aea5-ff43ef7647c6.10289105.6188
> - eb32b1ca-807a-4867-aea5-ff43ef7647c6.10289105.6167
>
> But notice the difference: 6160, 6188, 6167, but not 6162 nor 6186
>
> Before I remove these objects I want to verify with other users if they
> see the same and if my thinking is correct.
>
> Wido
>
> [0]: https://gist.github.com/wido/6650e66b09770ef02df89636891bef04
This is a known issue and there are multiple commits on the upstream
luminous branch designed to address this in a variety of ways, such as
resharding being more robust, resharding cleaning up old shards
automatically, and administrative command-line support to manually clean
up old shards.
These will all be included in the next luminous release.
Eric
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com