[ https://issues.apache.org/jira/browse/FLINK-36655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
guanghua pi resolved FLINK-36655. --------------------------------- Release Note: reduce rocksdb writeBufferSize Resolution: Fixed > using flink state processor api to process big state in rocksdb is very slow > ----------------------------------------------------------------------------- > > Key: FLINK-36655 > URL: https://issues.apache.org/jira/browse/FLINK-36655 > Project: Flink > Issue Type: Technical Debt > Components: API / DataStream > Affects Versions: 1.12.7, 1.13.6, 1.14.6 > Reporter: guanghua pi > Priority: Critical > Labels: State, processor, rocksdb > Attachments: image-2024-11-04-17-06-24-614.png > > > My current streaming task status backend is rocksdb. A savepoint will > generate 65g of data. I'm using State Processor API to read data on rocksdb. > My demo program is very simple: it reads the original data and then writes it > to another HDFS directory. I use the parameters of rocksdb: > SPINNING_DISK_OPTIMIZED_HIGH_MEM. The configuration of my flink_config file > is as follows: > ||taskmanager.memory.managed.fraction: 0.1 > taskmanager.memory.jvm-overhead.fraction: 0.05 > taskmanager.memory.jvm-overhead.max: 128mb > taskmanager.memory.jvm-overhead.min: 64mb > taskmanager.memory.framework.off-heap.size: 64mb > taskmanager.memory.jvm-metaspace.size: 128m > taskmanager.memory.network.max: 128mb > taskmanager.memory.network.fraction: 0.1 > taskmanager.memory.managed.size: 32mb > taskmanager.memory.task.off-heap.size: 2253mb > state.backend.rocksdb.memory.managed: false > state.backend.rocksdb.metrics.block-cache-capacity: true > state.backend.rocksdb.metrics.block-cache-pinned-usage: true > state.backend.rocksdb.metrics.block-cache-usage: true > state.backend.rocksdb.metrics.bloom-filter-full-positive: true > state.backend.rocksdb.memory.write-buffer-ratio: 0.5 > state.backend.rocksdb.memory.high-prio-pool-ratio: 0.2 > state.backend.rocksdb.memory.fixed-per-slot: 1024mb|| || > |Col A1| | > this is my TM figure: > !image-2024-11-04-17-06-24-614.png! > TM memory and JM is -yjm 1G -ytm 3G > My current problem slow below > 1. After running the program for 4 hours, I will encounter " Diagnostics: > [2024-11-04 03:00:48.539]Container > [pid=8166,containerID=container_1728961635507_3104_01_000007] is running > 765952B beyond the 'PHYSICAL' memory limit. Current usage: 3.0 GB of 3 GB > physical memory used; 10.1 GB of 6.2 GB virtual memory used. Killing > container." > 2. The reading speed of rocksdb continues to slow down over time. For > example, 60w can be read in the first hour, but only 50w can be read in 1h. > It will continue to decline in the end. > 3. in log file , I find : > Obtained shared RocksDB cache of size 67108864 bytes . but I setting > state.backend.rocksdb.memory.fixed-per-slot: 1024mb. The value cannot be > matched. > -- This message was sent by Atlassian Jira (v8.20.10#820010)