[ https://issues.apache.org/jira/browse/HAWQ-1498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16076231#comment-16076231 ]
Lin Wen commented on HAWQ-1498: ------------------------------- Hi, Harald, Thank you for reporting it! Would you like to provide more information? For example, print all the postgres process on segment when disk are not freed. Or concrete steps that can reproduce it. I am wondering if the query is executed successfully. If the query is finished, after a period of time(can be controlled by a GUC property), the idle QEs on segment should exit. If the QEs on segment exit, the disk are still not freed? > Segments keep open file descriptors for deleted files > ----------------------------------------------------- > > Key: HAWQ-1498 > URL: https://issues.apache.org/jira/browse/HAWQ-1498 > Project: Apache HAWQ > Issue Type: Bug > Reporter: Harald Bögeholz > Assignee: Radar Lei > Fix For: 2.2.0.0-incubating > > > I have been running some large computations in HAWQ using psql on the master. > These computations created temporary tables and dropped them again. > Nevertheless free disk space in HDFS decreased by much more than it should. > While the psql session on the master was still open I investigated on one of > the slave machines. > HDFS is stored on /mds: > {noformat} > [root@mds-hdp-04 ~]# ls -l /mds > total 36 > drwxr-xr-x. 3 root root 4096 Jun 14 04:23 falcon > drwxr-xr-x. 3 root root 4096 Jun 14 04:42 hdfs > drwx------. 2 root root 16384 Jun 8 02:48 lost+found > drwxr-xr-x. 5 storm hadoop 4096 Jun 14 04:45 storm > drwxr-xr-x. 4 root root 4096 Jun 14 04:43 yarn > drwxr-xr-x. 2 zookeeper hadoop 4096 Jun 14 04:39 zookeeper > [root@mds-hdp-04 ~]# df /mds > Filesystem 1K-blocks Used Available Use% Mounted on > /dev/vdc 515928320 314560220 175137316 65% /mds > [root@mds-hdp-04 ~]# du -s /mds > 89918952 /mds > {noformat} > Note that there is a more than 200 GB difference between the disk space used > according to df and the sum of all files on that file system according to du. > I have found the culprit to be several postgres processes running as gpadmin > and holding open file descriptors to deleted files. Here are the first few: > {noformat} > [root@mds-hdp-04 ~]# lsof +L1 | grep /mds/hdfs | head -10 > postgres 665334 gpadmin 18r REG 253,32 134217728 0 9438234 > /mds/hdfs/data/current/BP-23056860-118.138.237.114-1497415333069/current/finalized/subdir2/subdir193/blk_1073922482 > (deleted) > postgres 665334 gpadmin 34r REG 253,32 24488 0 9438114 > /mds/hdfs/data/current/BP-23056860-118.138.237.114-1497415333069/current/finalized/subdir2/subdir193/blk_1073922398 > (deleted) > postgres 665334 gpadmin 35r REG 253,32 199 0 9438115 > /mds/hdfs/data/current/BP-23056860-118.138.237.114-1497415333069/current/finalized/subdir2/subdir193/blk_1073922398_187044.meta > (deleted) > postgres 665334 gpadmin 37r REG 253,32 134217728 0 9438208 > /mds/hdfs/data/current/BP-23056860-118.138.237.114-1497415333069/current/finalized/subdir2/subdir193/blk_1073922446 > (deleted) > postgres 665334 gpadmin 38r REG 253,32 1048583 0 9438209 > /mds/hdfs/data/current/BP-23056860-118.138.237.114-1497415333069/current/finalized/subdir2/subdir193/blk_1073922446_187092.meta > (deleted) > postgres 665334 gpadmin 39r REG 253,32 1048583 0 9438235 > /mds/hdfs/data/current/BP-23056860-118.138.237.114-1497415333069/current/finalized/subdir2/subdir193/blk_1073922482_187128.meta > (deleted) > postgres 665334 gpadmin 40r REG 253,32 134217728 0 9438262 > /mds/hdfs/data/current/BP-23056860-118.138.237.114-1497415333069/current/finalized/subdir2/subdir193/blk_1073922555 > (deleted) > postgres 665334 gpadmin 41r REG 253,32 1048583 0 9438263 > /mds/hdfs/data/current/BP-23056860-118.138.237.114-1497415333069/current/finalized/subdir2/subdir193/blk_1073922555_187201.meta > (deleted) > postgres 665334 gpadmin 42r REG 253,32 134217728 0 9438285 > /mds/hdfs/data/current/BP-23056860-118.138.237.114-1497415333069/current/finalized/subdir2/subdir194/blk_1073922602 > (deleted) > postgres 665334 gpadmin 43r REG 253,32 1048583 0 9438286 > /mds/hdfs/data/current/BP-23056860-118.138.237.114-1497415333069/current/finalized/subdir2/subdir194/blk_1073922602_187248.meta > (deleted) > {noformat} > As soon I close the psql session on the master the disk space is freed on the > slaves: > {noformat} > [root@mds-hdp-04 ~]# df /mds > Filesystem 1K-blocks Used Available Use% Mounted on > /dev/vdc 515928320 89992720 399704816 19% /mds > [root@mds-hdp-04 ~]# du -s /mds > 89918952 /mds > [root@mds-hdp-04 ~]# lsof +L1 | grep /mds/hdfs | head -10 > {noformat} > I believe this to be a bug. At least for me it looks like a very undesirable > behavior. -- This message was sent by Atlassian JIRA (v6.4.14#64029)