hubcio opened a new pull request, #3269: URL: https://github.com/apache/iggy/pull/3269
core/server-ng bootstrapped exactly one shard with SHARD_ID=0 and senders=vec![sender] hardcoded; the multi-shard path was dead code. Cross-shard primitives copied from legacy core/server also did not fit core/shard's crossfire model (bounded MTx + try_send-or-drop, fd-transfer coordinator instead of SO_REUSEPORT). bootstrap() now spawns N OS threads from sharding.cpu_allocation, each pinned via nix::sched + hwlocality and running its own compio runtime + IggyMessageBus + IggyShard for the partitions hashed to that shard. Cross-thread shutdown rides an Arc<AtomicBool> polled by a per-shard watchdog, since the bus' Shutdown is !Send and cannot be triggered from the main thread directly. Partial shard-spawn failure and shard-thread panic now signal cluster-wide shutdown instead of hanging; the shutdown watchdog is detached from the bus drain. ShardFrame becomes a concrete enum (Consensus + Lifecycle); the R generic is lifted off IggyShard. Named routers (route_metadata / route_partition / route_consensus_control) replace the duplicated MessageBag match blocks, and a debug_assert at pump entry catches receiver-thread mis-binding that the ctor's assert_sender_ordering cannot see. ShardMetrics records frame_drops_total (counter, variant+reason labels), bumped at every inter-shard try_send rejection; without it, drop-and-recover under VSR retransmit is operationally indistinguishable from a livelock. The counter is atomic, so it is safe to bump from !Send compio reactor contexts. The legacy shard-mapping broadcast subsystem (periodic snapshot refresh task, three-state MappingSlot table, ReplicaMappingUpdate / ReplicaMappingClear frames) is retired entirely. Cross-shard replica routing now flows through the cluster-shared ReplicaOwnerTable: the owning shard's installer stamps its slot on a successful registry insert and CAS-clears it on disconnect, so every bus' send_to_replica slow path reads authoritative state with no broadcast or reconcile loop. Builder accepts the coordinator at ctor; IggyShard stays immutable post-construction. message_bus forward-fn types widen to carry replica/client id, and send_to_replica routes via the shared owner table so non-zero shards reconcile against shared state rather than shard 0's private bus. WAL recovery is serialized across shards: non-zero shards open the WAL read-only, reject mutating ops (drain, set_snapshot_op) on a read-only storage, route an invalid WAL header to truncate_or_fail, and close the read-only fd once recovery completes. Storage milestones (durable PartitionJournal, durable (view, commit_op) watermark) and SDK (client_id, request_id) durability across reconnect remain out of scope; tracked separately. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
