hubcio commented on code in PR #3269:
URL: https://github.com/apache/iggy/pull/3269#discussion_r3263422607
##########
core/configs/src/server_config/sharding.rs:
##########
@@ -21,11 +21,69 @@ use std::str::FromStr;
use configs::ConfigEnv;
-#[derive(Debug, Deserialize, Serialize, Default, ConfigEnv)]
+/// Default capacity of the per-shard inter-shard inbox channel. Sized
+/// comfortably above the consensus working set, which is roughly
+/// `PIPELINE_PREPARE_QUEUE_MAX (= 8) * replica_count * directions`
+/// frames in flight per shard, without allowing a runaway producer to
+/// eat unbounded memory. Tunable via `[system.sharding] inbox_capacity`
+/// in TOML.
+///
+/// The capacity must also absorb the worst-case cross-shard client
+/// Reply burst. Unlike consensus frames, client Replies have no VSR
+/// retransmit path: a Reply lost on full inbox is gone and the client
+/// times out. A reasonable lower bound is
+/// `max_inflight_client_requests / num_shards` (assuming requests are
+/// distributed evenly across owning shards) plus the consensus
+/// headroom above.
+///
+/// Consensus frames and client-reply forwards share this one channel,
+/// so the two headrooms are not independent: a consensus burst or
+/// retransmit storm can fill the inbox with consensus frames exactly
+/// when a client Reply needs the space. A single `inbox_capacity` knob
+/// cannot isolate the two frame classes - size it for the sum of both
+/// worst cases occurring together. Watch the drop-site `tracing` logs
+/// (and, once a per-shard exporter lands, the `frame_drops_total`
+/// `{variant="forward_client_send"}` counter) to detect when the bound
+/// is too low in production.
+pub const DEFAULT_INBOX_CAPACITY: usize = 1024;
+
+/// Maximum permitted per-shard inbox depth. The channel is allocated
+/// up-front per shard, so a runaway value here OOMs the process at boot.
+/// `1 << 20` (~1M frames) is several orders of magnitude above any
+/// realistic backpressure target and still fits comfortably in process
+/// address space.
+pub const INBOX_CAPACITY_MAX: usize = 1 << 20;
+
+const fn default_inbox_capacity() -> usize {
+ DEFAULT_INBOX_CAPACITY
+}
+
+#[derive(Debug, Deserialize, Serialize, ConfigEnv)]
pub struct ShardingConfig {
#[serde(default)]
#[config_env(leaf)]
pub cpu_allocation: CpuAllocation,
+ /// Per-shard inter-shard inbox channel capacity. Bounded by design.
+ /// Drops on full inbox of consensus frames are recovered by VSR
+ /// retransmit. Drops of cross-shard client Reply frames are terminal:
+ /// the client never receives the reply (no in-protocol retransmit).
+ /// Both frame classes share this one channel, so a consensus burst
+ /// can starve client-reply forwards: size against the worst-case sum
+ /// of consensus working set + peak client-reply fan-out per shard
+ /// occurring together; see `DEFAULT_INBOX_CAPACITY` for the
+ /// rationale. Used by `core/server-ng`; the legacy server uses its
+ /// own hard-coded inbox sizing.
+ #[serde(default = "default_inbox_capacity")]
+ pub inbox_capacity: usize,
Review Comment:
intentionally out of scope here. plan is two lanes - one bounded queue for
consensus frames (drops recovered by VSR retransmit) and one for client `Reply`
(drops terminal, must size for worst-case fan-out). the single-channel design
here is the minimum-viable wiring so `frame_drops_total{variant,reason}` lights
up under load and we get real numbers to size the split.
actually thansk for this comment, i'll add todo in code.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]