[
https://issues.apache.org/jira/browse/HAWQ-1498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16078138#comment-16078138
]
Harald Bögeholz commented on HAWQ-1498:
---------------------------------------
I don't know too much about the inner workings of HAWQ, but there is one more
thought I have: Even if QEs don't exit, why do they hold on to a file
descriptor to a deleted file? If they closed that descriptor I wouldn't mind
them hanging around for some time ...
> Segments keep open file descriptors for deleted files
> -----------------------------------------------------
>
> Key: HAWQ-1498
> URL: https://issues.apache.org/jira/browse/HAWQ-1498
> Project: Apache HAWQ
> Issue Type: Bug
> Reporter: Harald Bögeholz
> Assignee: Radar Lei
> Fix For: 2.2.0.0-incubating
>
>
> I have been running some large computations in HAWQ using psql on the master.
> These computations created temporary tables and dropped them again.
> Nevertheless free disk space in HDFS decreased by much more than it should.
> While the psql session on the master was still open I investigated on one of
> the slave machines.
> HDFS is stored on /mds:
> {noformat}
> [root@mds-hdp-04 ~]# ls -l /mds
> total 36
> drwxr-xr-x. 3 root root 4096 Jun 14 04:23 falcon
> drwxr-xr-x. 3 root root 4096 Jun 14 04:42 hdfs
> drwx------. 2 root root 16384 Jun 8 02:48 lost+found
> drwxr-xr-x. 5 storm hadoop 4096 Jun 14 04:45 storm
> drwxr-xr-x. 4 root root 4096 Jun 14 04:43 yarn
> drwxr-xr-x. 2 zookeeper hadoop 4096 Jun 14 04:39 zookeeper
> [root@mds-hdp-04 ~]# df /mds
> Filesystem 1K-blocks Used Available Use% Mounted on
> /dev/vdc 515928320 314560220 175137316 65% /mds
> [root@mds-hdp-04 ~]# du -s /mds
> 89918952 /mds
> {noformat}
> Note that there is a more than 200 GB difference between the disk space used
> according to df and the sum of all files on that file system according to du.
> I have found the culprit to be several postgres processes running as gpadmin
> and holding open file descriptors to deleted files. Here are the first few:
> {noformat}
> [root@mds-hdp-04 ~]# lsof +L1 | grep /mds/hdfs | head -10
> postgres 665334 gpadmin 18r REG 253,32 134217728 0 9438234
> /mds/hdfs/data/current/BP-23056860-118.138.237.114-1497415333069/current/finalized/subdir2/subdir193/blk_1073922482
> (deleted)
> postgres 665334 gpadmin 34r REG 253,32 24488 0 9438114
> /mds/hdfs/data/current/BP-23056860-118.138.237.114-1497415333069/current/finalized/subdir2/subdir193/blk_1073922398
> (deleted)
> postgres 665334 gpadmin 35r REG 253,32 199 0 9438115
> /mds/hdfs/data/current/BP-23056860-118.138.237.114-1497415333069/current/finalized/subdir2/subdir193/blk_1073922398_187044.meta
> (deleted)
> postgres 665334 gpadmin 37r REG 253,32 134217728 0 9438208
> /mds/hdfs/data/current/BP-23056860-118.138.237.114-1497415333069/current/finalized/subdir2/subdir193/blk_1073922446
> (deleted)
> postgres 665334 gpadmin 38r REG 253,32 1048583 0 9438209
> /mds/hdfs/data/current/BP-23056860-118.138.237.114-1497415333069/current/finalized/subdir2/subdir193/blk_1073922446_187092.meta
> (deleted)
> postgres 665334 gpadmin 39r REG 253,32 1048583 0 9438235
> /mds/hdfs/data/current/BP-23056860-118.138.237.114-1497415333069/current/finalized/subdir2/subdir193/blk_1073922482_187128.meta
> (deleted)
> postgres 665334 gpadmin 40r REG 253,32 134217728 0 9438262
> /mds/hdfs/data/current/BP-23056860-118.138.237.114-1497415333069/current/finalized/subdir2/subdir193/blk_1073922555
> (deleted)
> postgres 665334 gpadmin 41r REG 253,32 1048583 0 9438263
> /mds/hdfs/data/current/BP-23056860-118.138.237.114-1497415333069/current/finalized/subdir2/subdir193/blk_1073922555_187201.meta
> (deleted)
> postgres 665334 gpadmin 42r REG 253,32 134217728 0 9438285
> /mds/hdfs/data/current/BP-23056860-118.138.237.114-1497415333069/current/finalized/subdir2/subdir194/blk_1073922602
> (deleted)
> postgres 665334 gpadmin 43r REG 253,32 1048583 0 9438286
> /mds/hdfs/data/current/BP-23056860-118.138.237.114-1497415333069/current/finalized/subdir2/subdir194/blk_1073922602_187248.meta
> (deleted)
> {noformat}
> As soon I close the psql session on the master the disk space is freed on the
> slaves:
> {noformat}
> [root@mds-hdp-04 ~]# df /mds
> Filesystem 1K-blocks Used Available Use% Mounted on
> /dev/vdc 515928320 89992720 399704816 19% /mds
> [root@mds-hdp-04 ~]# du -s /mds
> 89918952 /mds
> [root@mds-hdp-04 ~]# lsof +L1 | grep /mds/hdfs | head -10
> {noformat}
> I believe this to be a bug. At least for me it looks like a very undesirable
> behavior.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)