Harald Bögeholz created HAWQ-1498: ------------------------------------- Summary: Segments keep open file descriptors for deleted files Key: HAWQ-1498 URL: https://issues.apache.org/jira/browse/HAWQ-1498 Project: Apache HAWQ Issue Type: Bug Reporter: Harald Bögeholz Assignee: Radar Lei Fix For: 2.2.0.0-incubating
I have been running some large computations in HAWQ using psql on the master. These computations created temporary tables and dropped them again. Nevertheless free disk space in HDFS decreased by much more than it should. While the psql session on the master was still open I investigated on one of the slave machines. HDFS is stored on /mds: {noformat} [root@mds-hdp-04 ~]# ls -l /mds total 36 drwxr-xr-x. 3 root root 4096 Jun 14 04:23 falcon drwxr-xr-x. 3 root root 4096 Jun 14 04:42 hdfs drwx------. 2 root root 16384 Jun 8 02:48 lost+found drwxr-xr-x. 5 storm hadoop 4096 Jun 14 04:45 storm drwxr-xr-x. 4 root root 4096 Jun 14 04:43 yarn drwxr-xr-x. 2 zookeeper hadoop 4096 Jun 14 04:39 zookeeper [root@mds-hdp-04 ~]# df /mds Filesystem 1K-blocks Used Available Use% Mounted on /dev/vdc 515928320 314560220 175137316 65% /mds [root@mds-hdp-04 ~]# du -s /mds 89918952 /mds {noformat} Note that there is a more than 200 GB difference between the disk space used according to df and the sum of all files on that file system according to du. I have found the culprit to be several postgres processes running as gpadmin and holding open file descriptors to deleted files. Here are the first few: {noformat} [root@mds-hdp-04 ~]# lsof +L1 | grep /mds/hdfs | head -10 postgres 665334 gpadmin 18r REG 253,32 134217728 0 9438234 /mds/hdfs/data/current/BP-23056860-118.138.237.114-1497415333069/current/finalized/subdir2/subdir193/blk_1073922482 (deleted) postgres 665334 gpadmin 34r REG 253,32 24488 0 9438114 /mds/hdfs/data/current/BP-23056860-118.138.237.114-1497415333069/current/finalized/subdir2/subdir193/blk_1073922398 (deleted) postgres 665334 gpadmin 35r REG 253,32 199 0 9438115 /mds/hdfs/data/current/BP-23056860-118.138.237.114-1497415333069/current/finalized/subdir2/subdir193/blk_1073922398_187044.meta (deleted) postgres 665334 gpadmin 37r REG 253,32 134217728 0 9438208 /mds/hdfs/data/current/BP-23056860-118.138.237.114-1497415333069/current/finalized/subdir2/subdir193/blk_1073922446 (deleted) postgres 665334 gpadmin 38r REG 253,32 1048583 0 9438209 /mds/hdfs/data/current/BP-23056860-118.138.237.114-1497415333069/current/finalized/subdir2/subdir193/blk_1073922446_187092.meta (deleted) postgres 665334 gpadmin 39r REG 253,32 1048583 0 9438235 /mds/hdfs/data/current/BP-23056860-118.138.237.114-1497415333069/current/finalized/subdir2/subdir193/blk_1073922482_187128.meta (deleted) postgres 665334 gpadmin 40r REG 253,32 134217728 0 9438262 /mds/hdfs/data/current/BP-23056860-118.138.237.114-1497415333069/current/finalized/subdir2/subdir193/blk_1073922555 (deleted) postgres 665334 gpadmin 41r REG 253,32 1048583 0 9438263 /mds/hdfs/data/current/BP-23056860-118.138.237.114-1497415333069/current/finalized/subdir2/subdir193/blk_1073922555_187201.meta (deleted) postgres 665334 gpadmin 42r REG 253,32 134217728 0 9438285 /mds/hdfs/data/current/BP-23056860-118.138.237.114-1497415333069/current/finalized/subdir2/subdir194/blk_1073922602 (deleted) postgres 665334 gpadmin 43r REG 253,32 1048583 0 9438286 /mds/hdfs/data/current/BP-23056860-118.138.237.114-1497415333069/current/finalized/subdir2/subdir194/blk_1073922602_187248.meta (deleted) {noformat} As soon I close the psql session on the master the disk space is freed on the slaves: {noformat} [root@mds-hdp-04 ~]# df /mds Filesystem 1K-blocks Used Available Use% Mounted on /dev/vdc 515928320 89992720 399704816 19% /mds [root@mds-hdp-04 ~]# du -s /mds 89918952 /mds [root@mds-hdp-04 ~]# lsof +L1 | grep /mds/hdfs | head -10 {noformat} I believe this to be a bug. At least for me it looks like a very undesirable behavior. -- This message was sent by Atlassian JIRA (v6.4.14#64029)