[ https://issues.apache.org/jira/browse/HAWQ-1498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16115801#comment-16115801 ]
Yi Jin commented on HAWQ-1498: ------------------------------ I will fix this issue recently in version 2.3 incubating. > Segments keep open file descriptors for deleted files > ----------------------------------------------------- > > Key: HAWQ-1498 > URL: https://issues.apache.org/jira/browse/HAWQ-1498 > Project: Apache HAWQ > Issue Type: Bug > Reporter: Harald Bögeholz > Assignee: Lin Wen > Fix For: 2.3.0.0-incubating > > > I have been running some large computations in HAWQ using psql on the master. > These computations created temporary tables and dropped them again. > Nevertheless free disk space in HDFS decreased by much more than it should. > While the psql session on the master was still open I investigated on one of > the slave machines. > HDFS is stored on /mds: > {noformat} > [root@mds-hdp-04 ~]# ls -l /mds > total 36 > drwxr-xr-x. 3 root root 4096 Jun 14 04:23 falcon > drwxr-xr-x. 3 root root 4096 Jun 14 04:42 hdfs > drwx------. 2 root root 16384 Jun 8 02:48 lost+found > drwxr-xr-x. 5 storm hadoop 4096 Jun 14 04:45 storm > drwxr-xr-x. 4 root root 4096 Jun 14 04:43 yarn > drwxr-xr-x. 2 zookeeper hadoop 4096 Jun 14 04:39 zookeeper > [root@mds-hdp-04 ~]# df /mds > Filesystem 1K-blocks Used Available Use% Mounted on > /dev/vdc 515928320 314560220 175137316 65% /mds > [root@mds-hdp-04 ~]# du -s /mds > 89918952 /mds > {noformat} > Note that there is a more than 200 GB difference between the disk space used > according to df and the sum of all files on that file system according to du. > I have found the culprit to be several postgres processes running as gpadmin > and holding open file descriptors to deleted files. Here are the first few: > {noformat} > [root@mds-hdp-04 ~]# lsof +L1 | grep /mds/hdfs | head -10 > postgres 665334 gpadmin 18r REG 253,32 134217728 0 9438234 > /mds/hdfs/data/current/BP-23056860-118.138.237.114-1497415333069/current/finalized/subdir2/subdir193/blk_1073922482 > (deleted) > postgres 665334 gpadmin 34r REG 253,32 24488 0 9438114 > /mds/hdfs/data/current/BP-23056860-118.138.237.114-1497415333069/current/finalized/subdir2/subdir193/blk_1073922398 > (deleted) > postgres 665334 gpadmin 35r REG 253,32 199 0 9438115 > /mds/hdfs/data/current/BP-23056860-118.138.237.114-1497415333069/current/finalized/subdir2/subdir193/blk_1073922398_187044.meta > (deleted) > postgres 665334 gpadmin 37r REG 253,32 134217728 0 9438208 > /mds/hdfs/data/current/BP-23056860-118.138.237.114-1497415333069/current/finalized/subdir2/subdir193/blk_1073922446 > (deleted) > postgres 665334 gpadmin 38r REG 253,32 1048583 0 9438209 > /mds/hdfs/data/current/BP-23056860-118.138.237.114-1497415333069/current/finalized/subdir2/subdir193/blk_1073922446_187092.meta > (deleted) > postgres 665334 gpadmin 39r REG 253,32 1048583 0 9438235 > /mds/hdfs/data/current/BP-23056860-118.138.237.114-1497415333069/current/finalized/subdir2/subdir193/blk_1073922482_187128.meta > (deleted) > postgres 665334 gpadmin 40r REG 253,32 134217728 0 9438262 > /mds/hdfs/data/current/BP-23056860-118.138.237.114-1497415333069/current/finalized/subdir2/subdir193/blk_1073922555 > (deleted) > postgres 665334 gpadmin 41r REG 253,32 1048583 0 9438263 > /mds/hdfs/data/current/BP-23056860-118.138.237.114-1497415333069/current/finalized/subdir2/subdir193/blk_1073922555_187201.meta > (deleted) > postgres 665334 gpadmin 42r REG 253,32 134217728 0 9438285 > /mds/hdfs/data/current/BP-23056860-118.138.237.114-1497415333069/current/finalized/subdir2/subdir194/blk_1073922602 > (deleted) > postgres 665334 gpadmin 43r REG 253,32 1048583 0 9438286 > /mds/hdfs/data/current/BP-23056860-118.138.237.114-1497415333069/current/finalized/subdir2/subdir194/blk_1073922602_187248.meta > (deleted) > {noformat} > As soon I close the psql session on the master the disk space is freed on the > slaves: > {noformat} > [root@mds-hdp-04 ~]# df /mds > Filesystem 1K-blocks Used Available Use% Mounted on > /dev/vdc 515928320 89992720 399704816 19% /mds > [root@mds-hdp-04 ~]# du -s /mds > 89918952 /mds > [root@mds-hdp-04 ~]# lsof +L1 | grep /mds/hdfs | head -10 > {noformat} > I believe this to be a bug. At least for me it looks like a very undesirable > behavior. -- This message was sent by Atlassian JIRA (v6.4.14#64029)