Hi Igor,
Thank you for responding. In this case this looks like a breaking change. I
know of two applications that are now incorrectly displaying the pool usage and
capacity, It looks like they both rely on the USED field to be divided by the
number of replicas. One of those application is actually the Ceph Dashboard.
The other is OpenNebula
https://docs.opennebula.org/5.6/deployment/open_cloud_storage_setup/ceph_ds.html.
See the screenshot from Ceph Dashboard - https://imgur.com/vFFxsti. It is
stating that we have used 88% of the available space, because it wrongly
assumes that the pool capacity is 47.7TB + 6.7TB = 54.4TB, while it should be
more like (47.7TB/3) + 6.7TB = 22.6TB. It's absolutely the same story with our
OpenNebula instance - https://imgur.com/MOLbo4g. I'm not sure exactly which
update broke this, but it was definitely working correctly before.
I looked at OpenNebula's code for ceph datastore monitoring and found that it's
parsing the XML output of ceph df --format xml, so it looks like this changed
too.
>From file: /var/lib/one/remotes/tm/ceph/monitor:
# ------------ Compute datastore usage -------------
MONITOR_SCRIPT=$(cat <<EOF
$CEPH df --format xml
EOF
)
MONITOR_DATA=$(ssh_monitor_and_log $HOST "$MONITOR_SCRIPT" 2>&1)
MONITOR_STATUS=$?
if [ "$MONITOR_STATUS" = "0" ]; then
XPATH="${DRIVER_PATH}/../../datastore/xpath.rb --stdin"
echo -e "$(rbd_df_monitor ${MONITOR_DATA} ${POOL_NAME})"
else
echo "$MONITOR_DATA"
exit $MONITOR_STATUS
fi
>From file: /var/lib/one/remotes/datastore/ceph/ceph_utils.sh
#--------------------------------------------------------------------------------
# Parse the output of rbd df in xml format and generates a monitor string for a
# Ceph pool. You **MUST** define XPATH util before using this function
# @param $1 the xml output of the command
# @param $2 the pool name
#--------------------------------------------------------------------------------
rbd_df_monitor() {
local monitor_data i j xpath_elements pool_name bytes_used free
monitor_data=$1
pool_name=$2
while IFS= read -r -d '' element; do
xpath_elements[i++]="$element"
done < <(echo $monitor_data | $XPATH \
"/stats/pools/pool[name = \"${pool_name}\"]/stats/bytes_used" \
"/stats/pools/pool[name = \"${pool_name}\"]/stats/max_avail")
bytes_used="${xpath_elements[j++]:-0}"
free="${xpath_elements[j++]:-0}"
cat << EOF | tr -d '[:blank:][:space:]'
USED_MB=$(($bytes_used / 1024**2))\n
TOTAL_MB=$((($bytes_used + $free) / 1024**2))\n
FREE_MB=$(($free / 1024**2))\n
EOF
}
I believe Ceph Dashboard is doing the same, because the results are the same.
Best Regards,
On 7 Oct 2019, at 19:03, Igor Fedotov
<[email protected]<mailto:[email protected]>> wrote:
Hi Yordan,
this is mimic documentation and these snippets aren't valid for Nautilus any
more. They are still present in Nautilus pages though..
Going to create a corresponding ticket to fix that.
Relevant Nautilus changes for 'ceph df [detail]' command can be found in
Nautilus release notes: https://docs.ceph.com/docs/nautilus/releases/nautilus/
In short - USED field accounts for all the overhead data including replicas
etc. It's STORED field which now represents pure data user put into a pool.
Thanks,
Igor
On 10/2/2019 8:33 AM, Yordan Yordanov (Innologica) wrote:
The documentation states:
https://docs.ceph.com/docs/mimic/rados/operations/monitoring/
The POOLS section of the output provides a list of pools and the notional usage
of each pool. The output from this section DOES NOT reflect replicas, clones or
snapshots. For example, if you store an object with 1MB of data, the notional
usage will be 1MB, but the actual usage may be 2MB or more depending on the
number of replicas, clones and snapshots.
However in our case we are clearly seeing the USAGE field multiplying the total
object sizes to the number of replicas.
[root@blackmirror ~]# ceph df
RAW STORAGE:
CLASS SIZE AVAIL USED RAW USED %RAW USED
hdd 80 TiB 34 TiB 46 TiB 46 TiB 58.10
TOTAL 80 TiB 34 TiB 46 TiB 46 TiB 58.10
POOLS:
POOL ID STORED OBJECTS USED %USED MAX AVAIL
one 2 15 TiB 4.05M 46 TiB 68.32 7.2 TiB
bench 5 250 MiB 67 250 MiB 0 22 TiB
[root@blackmirror ~]# rbd du -p one
NAME PROVISIONED USED
...
<TOTAL> 20 TiB 15 TiB
This is causing several apps (including ceph dashboard) to display inaccurate
percentages, because they calculate the total pool capacity as USED + MAX
AVAIL, which in this case yields 53.2TB, which is way off. 7.2TB is about 13%
of that, so we receive alarms and this is bugging us for quite some time now.
_______________________________________________
ceph-users mailing list -- [email protected]<mailto:[email protected]>
To unsubscribe send an email to
[email protected]<mailto:[email protected]>
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]