[ 
https://issues.apache.org/jira/browse/FLINK-25373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504686#comment-17504686
 ] 

ren shangtao commented on FLINK-25373:
--------------------------------------

Hi Spongebob,Ihave the same question with you

 

I run a batch job serveral times in flink 1.14.3 use standalone mode, then the 
linux system out of memory,it kill the task manger ultimately.

 
Mar 11 00:14:18 node10 kernel: dockerd invoked oom-killer: gfp_mask=0x201da, 
order=0, oom_score_adj=-500
Mar 11 00:14:18 node10 kernel: dockerd cpuset=/ mems_allowed=0
Mar 11 00:14:18 node10 kernel: CPU: 28 PID: 4307 Comm: dockerd Kdump: loaded 
Tainted: G           OE  ------------ T 3.10.0-1160.31.1.el7.x86_64 #1
Mar 11 00:14:18 node10 kernel: Hardware name: Dell Inc. PowerEdge T440/021KCD, 
BIOS 2.8.2 08/31/2020
Mar 11 00:14:18 node10 kernel: Call Trace:
Mar 11 00:14:18 node10 kernel: [<ffffffff9dd835a9>] dump_stack+0x19/0x1b
Mar 11 00:14:18 node10 kernel: [<ffffffff9dd7e648>] dump_header+0x90/0x229
Mar 11 00:14:18 node10 kernel: [<ffffffff9d706492>] ? ktime_get_ts64+0x52/0xf0
Mar 11 00:14:18 node10 kernel: [<ffffffff9d75db1f>] ? delayacct_end+0x8f/0xb0
Mar 11 00:14:18 node10 kernel: [<ffffffff9d7c204d>] oom_kill_process+0x2cd/0x490
Mar 11 00:14:18 node10 kernel: [<ffffffff9d7c1a3d>] ? 
oom_unkillable_task+0xcd/0x120
Mar 11 00:14:18 node10 kernel: [<ffffffff9d7c273a>] out_of_memory+0x31a/0x500
Mar 11 00:14:18 node10 kernel: [<ffffffff9d7c9354>] 
__alloc_pages_nodemask+0xad4/0xbe0
Mar 11 00:14:18 node10 kernel: [<ffffffff9d818ea8>] 
alloc_pages_current+0x98/0x110
Mar 11 00:14:18 node10 kernel: [<ffffffff9d7bdb07>] __page_cache_alloc+0x97/0xb0
Mar 11 00:14:18 node10 kernel: [<ffffffff9d7c0aa0>] filemap_fault+0x270/0x420
Mar 11 00:14:18 node10 kernel: [<ffffffffc059691e>] 
__xfs_filemap_fault+0x7e/0x1d0 [xfs]
Mar 11 00:14:18 node10 kernel: [<ffffffffc0596b1c>] xfs_filemap_fault+0x2c/0x30 
[xfs]
Mar 11 00:14:18 node10 kernel: [<ffffffff9d7ee2aa>] 
__do_fault.isra.61+0x8a/0x100
Mar 11 00:14:18 node10 kernel: [<ffffffff9d7ee85c>] 
do_read_fault.isra.63+0x4c/0x1b0
Mar 11 00:14:18 node10 kernel: [<ffffffff9d7f60a0>] handle_mm_fault+0xa20/0xfb0
Mar 11 00:14:18 node10 kernel: [<ffffffff9dd90653>] __do_page_fault+0x213/0x500
Mar 11 00:14:18 node10 kernel: [<ffffffff9dd90975>] do_page_fault+0x35/0x90
Mar 11 00:14:18 node10 kernel: [<ffffffff9dd8c778>] page_fault+0x28/0x30
Mar 11 00:14:18 node10 kernel: Mem-Info:
Mar 11 00:14:18 node10 kernel: active_anon:15929291 inactive_anon:2464 
isolated_anon:0#012 active_file:775 inactive_file:1147 isolated_file:99#012 
unevictable:0 dirty:38 writeback:859 unstable:0#012 slab_reclaimable:21310 
slab_unreclaimable:42445#012 mapped:2114 shmem:2586 pagetables:45585 
bounce:0#012 free:82929 free_pcp:89 free_cma:0
Mar 11 00:14:18 node10 kernel: Node 0 DMA free:14840kB min:16kB low:20kB 
high:24kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB 
unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15980kB 
managed:15896kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB 
slab_reclaimable:0kB slab_unreclaimable:32kB kernel_stack:0kB pagetables:0kB 
unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB 
writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Mar 11 00:14:18 node10 kernel: lowmem_reserve[]: 0 1301 63712 63712
Mar 11 00:14:18 node10 kernel: Node 0 DMA32 free:250568kB min:1380kB low:1724kB 
high:2068kB active_anon:1063892kB inactive_anon:488kB active_file:0kB 
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB 
present:1566272kB managed:1332768kB mlocked:0kB dirty:0kB writeback:0kB 
mapped:0kB shmem:516kB slab_reclaimable:1776kB slab_unreclaimable:3808kB 
kernel_stack:704kB pagetables:2924kB unstable:0kB bounce:0kB free_pcp:0kB 
local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? 
yes
Mar 11 00:14:18 node10 kernel: lowmem_reserve[]: 0 0 62410 62410
Mar 11 00:14:18 node10 kernel: Node 0 Normal free:66308kB min:66184kB 
low:82728kB high:99276kB active_anon:62653272kB inactive_anon:9368kB 
active_file:3100kB inactive_file:4588kB unevictable:0kB isolated(anon):0kB 
isolated(file):396kB present:65011712kB managed:63911352kB mlocked:0kB 
dirty:152kB writeback:3436kB mapped:8456kB shmem:9828kB 
slab_reclaimable:83464kB slab_unreclaimable:165940kB kernel_stack:23776kB 
pagetables:179416kB unstable:0kB bounce:0kB free_pcp:352kB local_pcp:0kB 
free_cma:0kB writeback_tmp:0kB pages_scanned:816 all_unreclaimable? no
Mar 11 00:14:18 node10 kernel: lowmem_reserve[]: 0 0 0 0
Mar 11 00:14:18 node10 kernel: Node 0 DMA: 0*4kB 1*8kB (U) 1*16kB (U) 1*32kB 
(U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 0*1024kB 1*2048kB (M) 3*4096kB 
(M) = 14840kB
Mar 11 00:14:18 node10 kernel: Node 0 DMA32: 507*4kB (UM) 360*8kB (UEM) 
554*16kB (UEM) 508*32kB (UEM) 160*64kB (UEM) 91*128kB (UEM) 46*256kB (UEM) 
13*512kB (UEM) 64*1024kB (UM) 4*2048kB (UEM) 26*4096kB (UM) = 250572kB
Mar 11 00:14:18 node10 kernel: Node 0 Normal: 1637*4kB (UEM) 5641*8kB (UEM) 
1117*16kB (UEM) 1*32kB (U) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 
0*4096kB = 69580kB
Mar 11 00:14:18 node10 kernel: Node 0 hugepages_total=0 hugepages_free=0 
hugepages_surp=0 hugepages_size=1048576kB
Mar 11 00:14:18 node10 kernel: Node 0 hugepages_total=0 hugepages_free=0 
hugepages_surp=0 hugepages_size=2048kB
Mar 11 00:14:18 node10 kernel: 4889 total pagecache pages
Mar 11 00:14:18 node10 kernel: 0 pages in swap cache
Mar 11 00:14:18 node10 kernel: Swap cache stats: add 0, delete 0, find 0/0
Mar 11 00:14:18 node10 kernel: Free swap  = 0kB
Mar 11 00:14:18 node10 kernel: Total swap = 0kB
Mar 11 00:14:18 node10 kernel: 16648491 pages RAM
Mar 11 00:14:18 node10 kernel: 0 pages HighMem/MovableOnly
Mar 11 00:14:18 node10 kernel: 333487 pages reserved
Mar 11 00:14:18 node10 kernel: [ pid ]   uid  tgid total_vm      rss nr_ptes 
swapents oom_score_adj name
Mar 11 00:14:18 node10 kernel: [  782]     0   782     9764     1824      25    
    0             0 systemd-journal
Mar 11 00:14:18 node10 kernel: [  804]     0   804    68076      118      32    
    0             0 lvmetad
Mar 11 00:14:18 node10 kernel: [  819]     0   819    11383      161      24    
    0         -1000 systemd-udevd
Mar 11 00:14:18 node10 kernel: [ 1106]     0  1106     6596       80      18    
    0             0 systemd-logind
Mar 11 00:14:18 node10 kernel: [ 1108]     0  1108     5419       99      14    
    0             0 irqbalance
Mar 11 00:14:18 node10 kernel: [ 1111]    81  1111    14554      167      34    
    0          -900 dbus-daemon
Mar 11 00:14:18 node10 kernel: [ 1117]     0  1117   119115      530      83    
    0             0 NetworkManager
Mar 11 00:14:18 node10 kernel: [ 1122]   999  1122   153058     1605      61    
    0             0 polkitd
Mar 11 00:14:18 node10 kernel: [ 1456]     0  1456   852618     9016     133    
    0          -500 dockerd
Mar 11 00:14:18 node10 kernel: [ 1484]     0  1484   143571     3370      98    
    0             0 tuned
Mar 11 00:14:18 node10 kernel: [ 1487]     0  1487    28235      259      58    
    0         -1000 sshd
Mar 11 00:14:18 node10 kernel: [ 1490]     0  1490   179067     3494      23    
    0             0 nats-server
Mar 11 00:14:18 node10 kernel: [ 1493]     0  1493    54632     1332      43    
    0             0 rsyslogd
Mar 11 00:14:18 node10 kernel: [ 1753]     0  1753    31598      156      19    
    0             0 crond
Mar 11 00:14:18 node10 kernel: [ 1813]     0  1813    27552       34       9    
    0             0 agetty
Mar 11 00:14:18 node10 kernel: [ 1875]     0  1875   827410     6912     116    
    0          -500 containerd
Mar 11 00:14:18 node10 kernel: [18022]     0 18022    40514      430      81    
    0             0 sshd
Mar 11 00:14:18 node10 kernel: [18039]     0 18039    40429      347      81    
    0             0 sshd
Mar 11 00:14:18 node10 kernel: [18041]     0 18041    28920      132      14    
    0             0 bash
Mar 11 00:14:18 node10 kernel: [18216]     0 18216    18073      184      38    
    0             0 sftp-server
Mar 11 00:14:18 node10 kernel: [18811]     0 18811  4716599   121763     382    
    0             0 java
Mar 11 00:14:18 node10 kernel: [19181]     0 19181  4701076  1900016    3824    
    0             0 java
Mar 11 00:14:18 node10 kernel: [20204]     0 20204  4743884   111675     466    
    0             0 java
Mar 11 00:14:18 node10 kernel: [20779]     0 20779    40624      268      35    
    0             0 top
Mar 11 00:14:18 node10 kernel: [74904]     0 74904 23838477 13752532   39743    
    0             0 java
Mar 11 00:14:18 node10 kernel: [98890]    26 98890    28322       50      11    
    0             0 .systemd-privat
Mar 11 00:14:18 node10 kernel: [98894]    26 98894    28322       55      11    
    0             0 bash
Mar 11 00:14:18 node10 kernel: [99125]    26 99125    28322       54      11    
    0             0 bash
Mar 11 00:14:18 node10 kernel: [99243]    26 99243    63171      221      42    
    0             0 curl
Mar 11 00:14:18 node10 kernel: [99245]     0 99245    34790      169      25    
    0             0 crond
Mar 11 00:14:18 node10 kernel: Out of memory: Kill process 74904 (java) score 
820 or sacrifice child
Mar 11 00:14:18 node10 kernel: Killed process 74904 (java), UID 0, 
total-vm:95353908kB, anon-rss:55010692kB, file-rss:448kB, shmem-rss:0kB
Mar 11 00:14:19 node10 systemd: Started Session 787 of user root.
Mar 11 00:14:21 node10 systemd-logind: Removed session 699.
Mar 11 00:15:01 node10 systemd: Started Session 788 of user root.
Mar 11 00:16:01 node10 systemd: Started Session 789 of user root.
Mar 11 00:17:01 node10 systemd: Started Session 790 of user root.
Mar 11 00:18:01 node10 systemd: Started Session 791 of user root.
Mar 11 00:19:01 node10 systemd: Started Session 792 of user root.
Mar 11 00:19:15 node10 systemd: Removed slice User Slice of postgres.
Mar 11 00:20:01 node10 systemd: Started Session 793 of user root.
Mar 11 00:20:01 node10 systemd: Started Session 794 of user root.
Mar 11 00:20:03 node10 crond: sendmail: fatal: parameter inet_interfaces: no 
local interface found for ::1

> task manager can not free memory when jobs are finished
> -------------------------------------------------------
>
>                 Key: FLINK-25373
>                 URL: https://issues.apache.org/jira/browse/FLINK-25373
>             Project: Flink
>          Issue Type: Bug
>          Components: API / Core
>    Affects Versions: 1.14.0
>         Environment: flink 1.14.0
>            Reporter: Spongebob
>            Priority: Major
>         Attachments: image-2021-12-19-11-48-33-622.png, 
> image-2022-03-11-10-06-19-499.png
>
>
> I submit my Flinksql jobs to the Flink standalone cluster and what  out of my 
> expectation is that TaskManagers could not free memory when all jobs are 
> finished whether normally or not.
> And I found that there were many threads named like `
> flink-taskexecutor-io-thread-x` and their states were waiting on conditions.
> here's the detail of these threads:
>  
> "flink-taskexecutor-io-thread-31" Id=5386 WAITING on 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2da8b14c
> at sun.misc.Unsafe.park(Native Method)
> - waiting on 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2da8b14c
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
> at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> at 
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> !image-2021-12-19-11-48-33-622.png!
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to