conanceph opened a new issue, #59047: URL: https://github.com/apache/doris/issues/59047
### Search before asking - [x] I had searched in the [issues](https://github.com/apache/doris/issues?q=is%3Aissue) and found no similar issues. ### Version doris version:3.1.0 ### What's Wrong? 背景介绍: 1. 使用 k8s 部署 doris,be是3实例,并且配置使用 ceph 做为后端 pvc 卷的实际存储设备 2. 具体环境信息、异常内容如下: ``` shell [root@dev-int-pf-01 ~]# kubectl get pod | grep doriscluster-helm-be doriscluster-helm-be-0 1/1 Running 0 49m doriscluster-helm-be-1 1/1 Running 0 50m doriscluster-helm-be-2 1/1 Running 0 50m [root@dev-int-pf-01 ~]# [root@dev-int-hkpf-01 ~]# kubectl get pvc | grep be-storage be-storage-doriscluster-helm-be-0 Bound pvc-12028aef-ba54-491a-a875-1a6203d40fd1 100Gi RWO rook-ceph-block <unset> 11d be-storage-doriscluster-helm-be-1 Bound pvc-0f8bd60c-a173-4a8b-893d-288885c250f5 100Gi RWO rook-ceph-block <unset> 11d be-storage-doriscluster-helm-be-2 Bound pvc-daeedfa7-3b6d-4343-bf73-972dc71dba01 100Gi RWO rook-ceph-block <unset> 11d [root@dev-int-hkpf-01 ~]# [root@dev-int-hkpf-01 ~]# kubectl get pv | grep be-storage pvc-0f8bd60c-a173-4a8b-893d-288885c250f5 100Gi RWO Delete Bound default/be-storage-doriscluster-helm-be-1 rook-ceph-block <unset> 11d pvc-12028aef-ba54-491a-a875-1a6203d40fd1 100Gi RWO Delete Bound default/be-storage-doriscluster-helm-be-0 rook-ceph-block <unset> 11d pvc-daeedfa7-3b6d-4343-bf73-972dc71dba01 100Gi RWO Delete Bound default/be-storage-doriscluster-helm-be-2 rook-ceph-block <unset> 11d # 拿 be-storage-doriscluster-helm-be-0 这个 pv 举例: [root@dev-int-hkpf-01 ~]# kubectl describe pv pvc-12028aef-ba54-491a-a875-1a6203d40fd1 | grep imageName imageName=csi-vol-1a89d830-ea89-4ec9-bb23-96569997bab0 [root@dev-int-hkpf-01 ~]# # 可以看到,pv 对应的 ceph rbd 设备已经被占用了85G [root@rook-ceph-tools-64d98bcb68-tmxc7 /]# rbd du replicapool/csi-vol-1a89d830-ea89-4ec9-bb23-96569997bab0 warning: fast-diff map is not enabled for csi-vol-1a89d830-ea89-4ec9-bb23-96569997bab0. operation may be slow. NAME PROVISIONED USED csi-vol-1a89d830-ea89-4ec9-bb23-96569997bab0 100 GiB 85 GiB [root@rook-ceph-tools-64d98bcb68-tmxc7 /]# # 但在 dori-be-0 pod实例中看到只使用了8G root@doriscluster-helm-be-0:/opt/apache-doris# du -h --max-depth=1 /opt/apache-doris/be 24K /opt/apache-doris/be/bin 24K /opt/apache-doris/be/conf 21M /opt/apache-doris/be/dict 5.2G /opt/apache-doris/be/lib 620K /opt/apache-doris/be/licenses 1.4G /opt/apache-doris/be/log 0 /opt/apache-doris/be/plugins 8.0G /opt/apache-doris/be/storage 60K /opt/apache-doris/be/tools 2.6M /opt/apache-doris/be/www 15G /opt/apache-doris/be root@doriscluster-helm-be-0:/opt/apache-doris# root@doriscluster-helm-be-0:/opt/apache-doris# df -Th /opt/apache-doris/be/storage Filesystem Type Size Used Avail Use% Mounted on /dev/rbd0 ext4 98G 8.0G 90G 9% /opt/apache-doris/be/storage root@doriscluster-helm-be-0:/opt/apache-doris# ``` 3. 尝试过的处理方法: 参考 issue 修改了如下参数: https://github.com/apache/doris/issues/30016 https://github.com/apache/doris/issues/31501 ``` shell # 修改 doris-be-configmap 的如下参数,该参数用于控制垃圾回收(GC)检查线程的运行间隔。定义了 Doris 后台的垃圾回收检查线程多久被唤醒一次,以扫描并清理不再需要的垃圾数据文件(例如,因数据合并、删除或导入失败产生的残留文件) path_gc_check_interval_second = 120 ``` 但发现并不管用,同时也重启了 doris-be ,还是一样的没有释放ceph存储容量 4. 检查过 ceph 本身并没有对这些 rbd image 做快照,并且其他服务在删除后,容量是能够正常释放的 ### What You Expected? doris be 或 fe 实际的数据大小,应该与占用的实际磁盘大小一致 ### How to Reproduce? _No response_ ### Anything Else? _No response_ ### Are you willing to submit PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
