conanceph opened a new issue, #59047:
URL: https://github.com/apache/doris/issues/59047

   ### Search before asking
   
   - [x] I had searched in the 
[issues](https://github.com/apache/doris/issues?q=is%3Aissue) and found no 
similar issues.
   
   
   ### Version
   
   doris version:3.1.0
   
   ### What's Wrong?
   
   背景介绍:
   1. 使用 k8s 部署 doris,be是3实例,并且配置使用 ceph 做为后端 pvc 卷的实际存储设备
   2. 具体环境信息、异常内容如下:
   ``` shell
   [root@dev-int-pf-01 ~]# kubectl get pod | grep doriscluster-helm-be
   doriscluster-helm-be-0                                            1/1     
Running     0                49m
   doriscluster-helm-be-1                                            1/1     
Running     0                50m
   doriscluster-helm-be-2                                            1/1     
Running     0                50m
   [root@dev-int-pf-01 ~]# 
   [root@dev-int-hkpf-01 ~]# kubectl get pvc | grep be-storage
   be-storage-doriscluster-helm-be-0   Bound    
pvc-12028aef-ba54-491a-a875-1a6203d40fd1   100Gi      RWO            
rook-ceph-block   <unset>                 11d
   be-storage-doriscluster-helm-be-1   Bound    
pvc-0f8bd60c-a173-4a8b-893d-288885c250f5   100Gi      RWO            
rook-ceph-block   <unset>                 11d
   be-storage-doriscluster-helm-be-2   Bound    
pvc-daeedfa7-3b6d-4343-bf73-972dc71dba01   100Gi      RWO            
rook-ceph-block   <unset>                 11d
   [root@dev-int-hkpf-01 ~]# 
   [root@dev-int-hkpf-01 ~]# kubectl get pv | grep be-storage
   pvc-0f8bd60c-a173-4a8b-893d-288885c250f5   100Gi      RWO            Delete  
         Bound    default/be-storage-doriscluster-helm-be-1   rook-ceph-block   
<unset>                          11d
   pvc-12028aef-ba54-491a-a875-1a6203d40fd1   100Gi      RWO            Delete  
         Bound    default/be-storage-doriscluster-helm-be-0   rook-ceph-block   
<unset>                          11d
   pvc-daeedfa7-3b6d-4343-bf73-972dc71dba01   100Gi      RWO            Delete  
         Bound    default/be-storage-doriscluster-helm-be-2   rook-ceph-block   
<unset>                          11d
   # 拿 be-storage-doriscluster-helm-be-0 这个 pv 举例:
   [root@dev-int-hkpf-01 ~]# kubectl describe pv 
pvc-12028aef-ba54-491a-a875-1a6203d40fd1 | grep imageName
                              
imageName=csi-vol-1a89d830-ea89-4ec9-bb23-96569997bab0
   [root@dev-int-hkpf-01 ~]# 
   # 可以看到,pv 对应的 ceph rbd 设备已经被占用了85G
   [root@rook-ceph-tools-64d98bcb68-tmxc7 /]# rbd du 
replicapool/csi-vol-1a89d830-ea89-4ec9-bb23-96569997bab0
   warning: fast-diff map is not enabled for 
csi-vol-1a89d830-ea89-4ec9-bb23-96569997bab0. operation may be slow.
   NAME                                          PROVISIONED  USED  
   csi-vol-1a89d830-ea89-4ec9-bb23-96569997bab0      100 GiB  85 GiB
   [root@rook-ceph-tools-64d98bcb68-tmxc7 /]# 
   # 但在 dori-be-0 pod实例中看到只使用了8G
   root@doriscluster-helm-be-0:/opt/apache-doris# du -h --max-depth=1 
/opt/apache-doris/be
   24K     /opt/apache-doris/be/bin
   24K     /opt/apache-doris/be/conf
   21M     /opt/apache-doris/be/dict
   5.2G    /opt/apache-doris/be/lib
   620K    /opt/apache-doris/be/licenses
   1.4G    /opt/apache-doris/be/log
   0       /opt/apache-doris/be/plugins
   8.0G    /opt/apache-doris/be/storage
   60K     /opt/apache-doris/be/tools
   2.6M    /opt/apache-doris/be/www
   15G     /opt/apache-doris/be
   root@doriscluster-helm-be-0:/opt/apache-doris# 
   root@doriscluster-helm-be-0:/opt/apache-doris# df -Th 
/opt/apache-doris/be/storage
   Filesystem     Type  Size  Used Avail Use% Mounted on
   /dev/rbd0      ext4   98G  8.0G   90G   9% /opt/apache-doris/be/storage
   root@doriscluster-helm-be-0:/opt/apache-doris# 
   ```
   3. 尝试过的处理方法:
   参考 issue 修改了如下参数:
   https://github.com/apache/doris/issues/30016
   https://github.com/apache/doris/issues/31501 
   ``` shell
   # 修改 doris-be-configmap 的如下参数,该参数用于控制垃圾回收(GC)检查线程的运行间隔。定义了 Doris 
后台的垃圾回收检查线程多久被唤醒一次,以扫描并清理不再需要的垃圾数据文件(例如,因数据合并、删除或导入失败产生的残留文件)
   path_gc_check_interval_second = 120
   ```
   但发现并不管用,同时也重启了 doris-be ,还是一样的没有释放ceph存储容量
   4. 检查过 ceph 本身并没有对这些 rbd image 做快照,并且其他服务在删除后,容量是能够正常释放的
   
   ### What You Expected?
   
   doris be 或 fe 实际的数据大小,应该与占用的实际磁盘大小一致
   
   ### How to Reproduce?
   
   _No response_
   
   ### Anything Else?
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to