Harald Bögeholz created HAWQ-1498:
-------------------------------------
Summary: Segments keep open file descriptors for deleted files
Key: HAWQ-1498
URL: https://issues.apache.org/jira/browse/HAWQ-1498
Project: Apache HAWQ
Issue Type: Bug
Reporter: Harald Bögeholz
Assignee: Radar Lei
Fix For: 2.2.0.0-incubating
I have been running some large computations in HAWQ using psql on the master.
These computations created temporary tables and dropped them again.
Nevertheless free disk space in HDFS decreased by much more than it should.
While the psql session on the master was still open I investigated on one of
the slave machines.
HDFS is stored on /mds:
{noformat}
[root@mds-hdp-04 ~]# ls -l /mds
total 36
drwxr-xr-x. 3 root root 4096 Jun 14 04:23 falcon
drwxr-xr-x. 3 root root 4096 Jun 14 04:42 hdfs
drwx------. 2 root root 16384 Jun 8 02:48 lost+found
drwxr-xr-x. 5 storm hadoop 4096 Jun 14 04:45 storm
drwxr-xr-x. 4 root root 4096 Jun 14 04:43 yarn
drwxr-xr-x. 2 zookeeper hadoop 4096 Jun 14 04:39 zookeeper
[root@mds-hdp-04 ~]# df /mds
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/vdc 515928320 314560220 175137316 65% /mds
[root@mds-hdp-04 ~]# du -s /mds
89918952 /mds
{noformat}
Note that there is a more than 200 GB difference between the disk space used
according to df and the sum of all files on that file system according to du.
I have found the culprit to be several postgres processes running as gpadmin
and holding open file descriptors to deleted files. Here are the first few:
{noformat}
[root@mds-hdp-04 ~]# lsof +L1 | grep /mds/hdfs | head -10
postgres 665334 gpadmin 18r REG 253,32 134217728 0 9438234
/mds/hdfs/data/current/BP-23056860-118.138.237.114-1497415333069/current/finalized/subdir2/subdir193/blk_1073922482
(deleted)
postgres 665334 gpadmin 34r REG 253,32 24488 0 9438114
/mds/hdfs/data/current/BP-23056860-118.138.237.114-1497415333069/current/finalized/subdir2/subdir193/blk_1073922398
(deleted)
postgres 665334 gpadmin 35r REG 253,32 199 0 9438115
/mds/hdfs/data/current/BP-23056860-118.138.237.114-1497415333069/current/finalized/subdir2/subdir193/blk_1073922398_187044.meta
(deleted)
postgres 665334 gpadmin 37r REG 253,32 134217728 0 9438208
/mds/hdfs/data/current/BP-23056860-118.138.237.114-1497415333069/current/finalized/subdir2/subdir193/blk_1073922446
(deleted)
postgres 665334 gpadmin 38r REG 253,32 1048583 0 9438209
/mds/hdfs/data/current/BP-23056860-118.138.237.114-1497415333069/current/finalized/subdir2/subdir193/blk_1073922446_187092.meta
(deleted)
postgres 665334 gpadmin 39r REG 253,32 1048583 0 9438235
/mds/hdfs/data/current/BP-23056860-118.138.237.114-1497415333069/current/finalized/subdir2/subdir193/blk_1073922482_187128.meta
(deleted)
postgres 665334 gpadmin 40r REG 253,32 134217728 0 9438262
/mds/hdfs/data/current/BP-23056860-118.138.237.114-1497415333069/current/finalized/subdir2/subdir193/blk_1073922555
(deleted)
postgres 665334 gpadmin 41r REG 253,32 1048583 0 9438263
/mds/hdfs/data/current/BP-23056860-118.138.237.114-1497415333069/current/finalized/subdir2/subdir193/blk_1073922555_187201.meta
(deleted)
postgres 665334 gpadmin 42r REG 253,32 134217728 0 9438285
/mds/hdfs/data/current/BP-23056860-118.138.237.114-1497415333069/current/finalized/subdir2/subdir194/blk_1073922602
(deleted)
postgres 665334 gpadmin 43r REG 253,32 1048583 0 9438286
/mds/hdfs/data/current/BP-23056860-118.138.237.114-1497415333069/current/finalized/subdir2/subdir194/blk_1073922602_187248.meta
(deleted)
{noformat}
As soon I close the psql session on the master the disk space is freed on the
slaves:
{noformat}
[root@mds-hdp-04 ~]# df /mds
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/vdc 515928320 89992720 399704816 19% /mds
[root@mds-hdp-04 ~]# du -s /mds
89918952 /mds
[root@mds-hdp-04 ~]# lsof +L1 | grep /mds/hdfs | head -10
{noformat}
I believe this to be a bug. At least for me it looks like a very undesirable
behavior.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)