[jira] [Commented] (HAWQ-1498) Segments keep open file descriptors for deleted files

JIRA Fri, 07 Jul 2017 07:18:26 -0700

    [ 
https://issues.apache.org/jira/browse/HAWQ-1498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16078138#comment-16078138
 ]


Harald Bögeholz commented on HAWQ-1498:
---------------------------------------

I don't know too much about the inner workings of HAWQ, but there is one more 
thought I have: Even if QEs don't exit, why do they hold on to a file 
descriptor to a deleted file? If they closed that descriptor I wouldn't mind 
them hanging around for some time ...

> Segments keep open file descriptors for deleted files
> -----------------------------------------------------
>
>                 Key: HAWQ-1498
>                 URL: https://issues.apache.org/jira/browse/HAWQ-1498
>             Project: Apache HAWQ
>          Issue Type: Bug
>            Reporter: Harald Bögeholz
>            Assignee: Radar Lei
>             Fix For: 2.2.0.0-incubating
>
>
> I have been running some large computations in HAWQ using psql on the master. 
> These computations created temporary tables and dropped them again. 
> Nevertheless free disk space in HDFS decreased by much more than it should. 
> While the psql session on the master was still open I investigated on one of 
> the slave machines.
> HDFS is stored on /mds:
> {noformat}
> [root@mds-hdp-04 ~]# ls -l /mds
> total 36
> drwxr-xr-x. 3 root      root    4096 Jun 14 04:23 falcon
> drwxr-xr-x. 3 root      root    4096 Jun 14 04:42 hdfs
> drwx------. 2 root      root   16384 Jun  8 02:48 lost+found
> drwxr-xr-x. 5 storm     hadoop  4096 Jun 14 04:45 storm
> drwxr-xr-x. 4 root      root    4096 Jun 14 04:43 yarn
> drwxr-xr-x. 2 zookeeper hadoop  4096 Jun 14 04:39 zookeeper
> [root@mds-hdp-04 ~]# df /mds
> Filesystem     1K-blocks      Used Available Use% Mounted on
> /dev/vdc       515928320 314560220 175137316  65% /mds
> [root@mds-hdp-04 ~]# du -s /mds
> 89918952      /mds
> {noformat}
> Note that there is a more than 200 GB difference between the disk space used 
> according to df and the sum of all files on that file system according to du.
> I have found the culprit to be several postgres processes running as gpadmin 
> and holding open file descriptors to deleted files. Here are the first few:
> {noformat}
> [root@mds-hdp-04 ~]# lsof +L1 | grep /mds/hdfs | head -10
> postgres 665334 gpadmin   18r   REG 253,32 134217728     0  9438234 
> /mds/hdfs/data/current/BP-23056860-118.138.237.114-1497415333069/current/finalized/subdir2/subdir193/blk_1073922482
>  (deleted)
> postgres 665334 gpadmin   34r   REG 253,32     24488     0  9438114 
> /mds/hdfs/data/current/BP-23056860-118.138.237.114-1497415333069/current/finalized/subdir2/subdir193/blk_1073922398
>  (deleted)
> postgres 665334 gpadmin   35r   REG 253,32       199     0  9438115 
> /mds/hdfs/data/current/BP-23056860-118.138.237.114-1497415333069/current/finalized/subdir2/subdir193/blk_1073922398_187044.meta
>  (deleted)
> postgres 665334 gpadmin   37r   REG 253,32 134217728     0  9438208 
> /mds/hdfs/data/current/BP-23056860-118.138.237.114-1497415333069/current/finalized/subdir2/subdir193/blk_1073922446
>  (deleted)
> postgres 665334 gpadmin   38r   REG 253,32   1048583     0  9438209 
> /mds/hdfs/data/current/BP-23056860-118.138.237.114-1497415333069/current/finalized/subdir2/subdir193/blk_1073922446_187092.meta
>  (deleted)
> postgres 665334 gpadmin   39r   REG 253,32   1048583     0  9438235 
> /mds/hdfs/data/current/BP-23056860-118.138.237.114-1497415333069/current/finalized/subdir2/subdir193/blk_1073922482_187128.meta
>  (deleted)
> postgres 665334 gpadmin   40r   REG 253,32 134217728     0  9438262 
> /mds/hdfs/data/current/BP-23056860-118.138.237.114-1497415333069/current/finalized/subdir2/subdir193/blk_1073922555
>  (deleted)
> postgres 665334 gpadmin   41r   REG 253,32   1048583     0  9438263 
> /mds/hdfs/data/current/BP-23056860-118.138.237.114-1497415333069/current/finalized/subdir2/subdir193/blk_1073922555_187201.meta
>  (deleted)
> postgres 665334 gpadmin   42r   REG 253,32 134217728     0  9438285 
> /mds/hdfs/data/current/BP-23056860-118.138.237.114-1497415333069/current/finalized/subdir2/subdir194/blk_1073922602
>  (deleted)
> postgres 665334 gpadmin   43r   REG 253,32   1048583     0  9438286 
> /mds/hdfs/data/current/BP-23056860-118.138.237.114-1497415333069/current/finalized/subdir2/subdir194/blk_1073922602_187248.meta
>  (deleted)
> {noformat}
> As soon I close the psql session on the master the disk space is freed on the 
> slaves:
> {noformat}
> [root@mds-hdp-04 ~]# df /mds
> Filesystem     1K-blocks     Used Available Use% Mounted on
> /dev/vdc       515928320 89992720 399704816  19% /mds
> [root@mds-hdp-04 ~]# du -s /mds
> 89918952      /mds
> [root@mds-hdp-04 ~]# lsof +L1 | grep /mds/hdfs | head -10
> {noformat}
> I believe this to be a bug. At least for me it looks like a very undesirable 
> behavior.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HAWQ-1498) Segments keep open file descriptors for deleted files

Reply via email to