fresh-borzoni commented on PR #3295:
URL: https://github.com/apache/fluss/pull/3295#issuecomment-4533118529
I still don't like this RocksDB snapshot pinning, tbh. It feels like it's
going to be a problem for wide partitioned tables with many buckets and
concurrent users.
Basically it's operational burden, two ways:
1. Hard to figure out why scans fail. A single scan over >200 buckets per
tablet server hits TOO_MANY_SCANNERS, the client retries 3× under 700 ms, can't
recover (real scans don't drain in sub-second), and dies. The error doesn't say
"per-server cap" - operators see a generic exception and have to dig
through server logs. Two concurrent scans interfering with each other is worse:
nobody expects read traffic to throttle other read traffic.
2. Writes degrade because memtables can't be flushed. Each held iterator
forces the leader to keep memtables in memory until the scan ends. Memory
pressure builds, compaction falls behind, write-path tail latency goes up.
Operators see "P99 writes got slow during a batch read" with no causal trail in
the logs from symptom back to the scan.
And bumping kv.scanner.max-per-server past 200 doesn't really help - it just
trades TOO_MANY_SCANNERS for memory pressure (several GB of pinning per server
for many partitions/many buckets tables).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]