GitHub user hubcio edited a comment on the discussion: PR Proposal:
Cache-Line-Friendly In-Memory Segment Index
This is our `IggyIndex`:
```rust
pub struct IggyIndex {
pub offset: u32,
pub position: u32,
pub timestamp: u64,
}
```
So with a 64-byte cache line we get exactly 4 entries per line.
In your solution each index would be padded to cache line (64b), so it'll
contain extra bytes:
```rust
pub struct IggyIndex {
pub offset: u32,
pub position: u32,
pub timestamp: u64,
_pad: [u8; 48], // to fill 64 bytes
}
```
lets say we have a 1 GB segment and each message is ~1000 bytes of user data.
That gives us roughly 1M index entries per segment. With the current layout
that’s about 16 MB of index data per segment (on disk and in memory), and we
don’t want to change the on-disk layout.
with the padded layout:
- each entry becomes 64 bytes,
- so the same 1M entries use ~64 MB,
- and 3/4 of that is padding.
that leads to a few issues:
- With, say, 500 segments loaded, we currently use ~8 GB of memory for indexes;
with the padded layout this jumps to ~32 GB.
- We can have a lot of segments open at once, and right now we drop indexes
from memory when a segment is closed (there’s no TTL cache), so the per-segment
footprint matters a lot.
- Because the struct is 4x larger, many fewer entries fit into the CPU caches,
which goes against the goal of making lookups more cache-friendly. Right now
we get 4 entries per 64-byte cache line; with the padded layout we’d only get 1.
- during index load when server starts you cannot "just" load entries into
memory, you need to prepare them to load them so that it's aligned with proper
padding, it's not 1 :1 operation so server startup will be increased (not sure
by how much)
Just to add - we don't store `Vec<IggyIndex>` in memory. We store `Bytes` (or
`Vec<u8>`) which we cast to IggyIndexView:
https://github.com/apache/iggy/blob/a8e2341a40832767975c5edcc9b8b7b15e487876/core/common/src/types/message/index_view.rs#L27
>From my side you still have a green light to experiment with a faster
>in-memory index representation 🦀
GitHub link:
https://github.com/apache/iggy/discussions/2381#discussioncomment-15016547
----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]