Niels Basjes created FLINK-4485:
-----------------------------------
Summary: Finished jobs in yarn session fill /tmp filesystem
Key: FLINK-4485
URL: https://issues.apache.org/jira/browse/FLINK-4485
Project: Flink
Issue Type: Bug
Components: JobManager
Affects Versions: 1.1.0
Reporter: Niels Basjes
Priority: Blocker
On a Yarn cluster I start a yarn-session with a few containers and task slots.
Then I fire a 'large' number of Flink batch jobs in sequence against this yarn
session. It is the exact same job (java code) yet it gets different parameters.
In this scenario it is exporting HBase tables to files in HDFS and the
parameters are about which data from which tables and the name of the target
directory.
After running several dozen jobs the jobs submission started to fail and we
investigated.
We found that the cause was that on the Yarn node which was hosting the
jobmanager the /tmp file system was full (4GB was 100% full).
How ever the output of {{du -hcs /tmp}} showed only 200MB in use.
We found that a very large file (we guess it is the jar of the job) was put in
/tmp , used, deleted yet the file handle was not closed by the jobmanager.
As soon as we killed the jobmanager the disk space was freed.
See parts of the output we got from {{lsof}} below.
{code}
COMMAND PID USER FD TYPE DEVICE SIZE
NODE NAME
java 15034 nbasjes 550r REG 253,17 66219695
245 /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000003
(deleted)
java 15034 nbasjes 551r REG 253,17 66219695
252 /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000007
(deleted)
java 15034 nbasjes 552r REG 253,17 66219695
267 /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000012
(deleted)
java 15034 nbasjes 553r REG 253,17 66219695
250 /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000005
(deleted)
java 15034 nbasjes 554r REG 253,17 66219695
288 /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000018
(deleted)
java 15034 nbasjes 555r REG 253,17 66219695
298 /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000025
(deleted)
java 15034 nbasjes 557r REG 253,17 66219695
254 /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000008
(deleted)
java 15034 nbasjes 558r REG 253,17 66219695
292 /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000019
(deleted)
java 15034 nbasjes 559r REG 253,17 66219695
275 /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000013
(deleted)
java 15034 nbasjes 560r REG 253,17 66219695
159 /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000002
(deleted)
java 15034 nbasjes 562r REG 253,17 66219695
238 /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000001
(deleted)
java 15034 nbasjes 568r REG 253,17 66219695
246 /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000004
(deleted)
java 15034 nbasjes 569r REG 253,17 66219695
255 /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000009
(deleted)
java 15034 nbasjes 571r REG 253,17 66219695
299 /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000026
(deleted)
java 15034 nbasjes 572r REG 253,17 66219695
293 /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000020
(deleted)
java 15034 nbasjes 574r REG 253,17 66219695
256 /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000010
(deleted)
java 15034 nbasjes 575r REG 253,17 66219695
302 /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000029
(deleted)
java 15034 nbasjes 576r REG 253,17 66219695
294 /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000021
(deleted)
java 15034 nbasjes 577r REG 253,17 66219695
262 /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000011
(deleted)
java 15034 nbasjes 578r REG 253,17 66219695
251 /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000006
(deleted)
java 15034 nbasjes 580r REG 253,17 66219695
295 /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000022
(deleted)
java 15034 nbasjes 581r REG 253,17 66219695
300 /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000027
(deleted)
java 15034 nbasjes 582r REG 253,17 66219695
188
/tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/cache/blob_e318d1698aa6e7dc91e5f4a9f8ba29781aebd8c4
(deleted)
java 15034 nbasjes 585r REG 253,17 66219695
279 /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000014
(deleted)
java 15034 nbasjes 586r REG 253,17 66219695
296 /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000023
(deleted)
java 15034 nbasjes 588r REG 253,17 66219695
301 /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000028
(deleted)
java 15034 nbasjes 589r REG 253,17 66219695
297 /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000024
(deleted)
java 15034 nbasjes 598r REG 253,17 66219695
280 /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000015
(deleted)
java 15034 nbasjes 601r REG 253,17 66219695
289 /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000016
(deleted)
java 15034 nbasjes 604r REG 253,17 66219695
284 /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000017
(deleted)
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)