[
https://issues.apache.org/jira/browse/FLINK-36655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17899412#comment-17899412
]
Gabor Somogyi commented on FLINK-36655:
---------------------------------------
[~guanghua] can you please elaborate what was the issue and how it was resolved
by `reduce rocksdb writeBufferSize`?
> using flink state processor api to process big state in rocksdb is very slow
> -----------------------------------------------------------------------------
>
> Key: FLINK-36655
> URL: https://issues.apache.org/jira/browse/FLINK-36655
> Project: Flink
> Issue Type: Technical Debt
> Components: API / DataStream
> Affects Versions: 1.12.7, 1.13.6, 1.14.6
> Reporter: guanghua pi
> Priority: Critical
> Labels: State, processor, rocksdb
> Attachments: image-2024-11-04-17-06-24-614.png
>
>
> My current streaming task status backend is rocksdb. A savepoint will
> generate 65g of data. I'm using State Processor API to read data on rocksdb.
> My demo program is very simple: it reads the original data and then writes it
> to another HDFS directory. I use the parameters of rocksdb:
> SPINNING_DISK_OPTIMIZED_HIGH_MEM. The configuration of my flink_config file
> is as follows:
> ||taskmanager.memory.managed.fraction: 0.1
> taskmanager.memory.jvm-overhead.fraction: 0.05
> taskmanager.memory.jvm-overhead.max: 128mb
> taskmanager.memory.jvm-overhead.min: 64mb
> taskmanager.memory.framework.off-heap.size: 64mb
> taskmanager.memory.jvm-metaspace.size: 128m
> taskmanager.memory.network.max: 128mb
> taskmanager.memory.network.fraction: 0.1
> taskmanager.memory.managed.size: 32mb
> taskmanager.memory.task.off-heap.size: 2253mb
> state.backend.rocksdb.memory.managed: false
> state.backend.rocksdb.metrics.block-cache-capacity: true
> state.backend.rocksdb.metrics.block-cache-pinned-usage: true
> state.backend.rocksdb.metrics.block-cache-usage: true
> state.backend.rocksdb.metrics.bloom-filter-full-positive: true
> state.backend.rocksdb.memory.write-buffer-ratio: 0.5
> state.backend.rocksdb.memory.high-prio-pool-ratio: 0.2
> state.backend.rocksdb.memory.fixed-per-slot: 1024mb|| ||
> |Col A1| |
> this is my TM figure:
> !image-2024-11-04-17-06-24-614.png!
> TM memory and JM is -yjm 1G -ytm 3G
> My current problem slow below
> 1. After running the program for 4 hours, I will encounter " Diagnostics:
> [2024-11-04 03:00:48.539]Container
> [pid=8166,containerID=container_1728961635507_3104_01_000007] is running
> 765952B beyond the 'PHYSICAL' memory limit. Current usage: 3.0 GB of 3 GB
> physical memory used; 10.1 GB of 6.2 GB virtual memory used. Killing
> container."
> 2. The reading speed of rocksdb continues to slow down over time. For
> example, 60w can be read in the first hour, but only 50w can be read in 1h.
> It will continue to decline in the end.
> 3. in log file , I find :
> Obtained shared RocksDB cache of size 67108864 bytes . but I setting
> state.backend.rocksdb.memory.fixed-per-slot: 1024mb. The value cannot be
> matched.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)