[
https://issues.apache.org/jira/browse/FLINK-31225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17693953#comment-17693953
]
xiaogang zhou commented on FLINK-31225:
---------------------------------------
[~yunta] Master
I logged onto the container, do a list and find out:
/tmp/flink-io-b7483fcd-50aa-45d2-bd8e-a2b0862c6323/job_00000000000000000000000000000000_op_LegacyKeyedCoProcessOperator_a3bf5557de0062839a60e12819947e17__12_50__uuid_b91b9766-b8af-4de3-9a01-b5dcce29a7bf/db#
ll | grep sst
-rw-r--r-- 1 flink flink 30662931 Feb 27 18:20 002512.sst
-rw-r--r-- 1 flink flink 1788364 Feb 27 18:30 002513.sst
-rw-r--r-- 1 flink flink 3209306 Feb 27 18:40 002515.sst
-rw-r--r-- 1 flink flink 13443570 Feb 27 18:40 002517.sst
-rw-r--r-- 1 flink flink 1694438 Feb 27 18:50 002518.sst
-rw-r--r-- 1 flink flink 1509487 Feb 27 18:50 002519.sst
I think the LSM tree merge process will delete L0 files.
And as mentioned above [https://github.com/facebook/rocksdb/issues/4112] . some
one mentioned this.
this is not exactly a leak
but a lot of memory is allocated
but not released
const int table_cache_size = (mutable_db_options_.max_open_files == -1)
? TableCache::kInfiniteCapacity
: mutable_db_options_.max_open_files - 10;
table_cache_ = NewLRUCache(table_cache_size,
immutable_db_options_.table_cache_numshardbits);
all allocated records are stored in this cache
mutable_db_options_.max_open_files is equal 1
so table_cache_size= 4 mb
seems this mutable_db_options_.max_open_files = -1 configuration will save
TableReader in memory, which will cause memory keep growing problem.
> rocksdb max open file can lead to oom
> --------------------------------------
>
> Key: FLINK-31225
> URL: https://issues.apache.org/jira/browse/FLINK-31225
> Project: Flink
> Issue Type: Improvement
> Components: Runtime / State Backends
> Affects Versions: 1.16.1
> Reporter: xiaogang zhou
> Priority: Major
> Attachments: image-2023-02-26-12-08-49-717.png, leak_test(2).png
>
>
> the default value for
> state.backend.rocksdb.files.open
> is -1
>
> [https://github.com/facebook/rocksdb/issues/4112] this issue told us the
> rocksdb will not close fd , so this can lead to oom issue.
>
> also I can reproduce the situation in my enviroment. left part(2/21- 2/24) is
> leave max open file to -1, right part(2/24 till now) is set the max open file
> to 300. the memory grow is very differnt.
> !image-2023-02-26-12-08-49-717.png|width=616,height=285!
>
> I have also attached a jeprof for 2/21-2/24 instance, the tm memory size is
> about 8GB, heap memory is about 2.6GB, the native part in leak_test is about
> 1GB. I think the remaining part (8-2.6-1)is occupied by fd .
>
> I suggest set this to a default value like 500.
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)