GitHub user hubcio added a comment to the discussion: PR Proposal: 
Cache-Line-Friendly In-Memory Segment Index

This is our `IggyIndex`:

```rust
pub struct IggyIndex {
    pub offset: u32,
    pub position: u32,
    pub timestamp: u64,
}
```

So with a 64-byte cache line we get exactly 4 entries per line.

In your solution each index would be padded to cache line (64b), so it'll 
contain extra bytes:
```rust
pub struct IggyIndex {
    pub offset: u32,
    pub position: u32,
    pub timestamp: u64,
     _pad: [u8; 48], // to fill 64 bytes
}
```

lets say we have a 1 GB segment and each message is ~1000 bytes of user data. 
That gives us roughly 1M index entries per segment. With the current layout 
that’s about 16 MB of index data per segment (on disk and in memory), and we 
don’t want to change the on-disk layout.

with the padded layout:
- each entry becomes 64 bytes,
- so the  same 1M entries use ~64 MB,
- and 3/4 of that  is padding.

that leads to a few issues:
- With, say, 500 segments loaded, we currently use ~8 GB of memory for indexes; 
with the padded layout this jumps to ~32 GB.
- We can have a lot of segments open at once, and right now we drop indexes 
from memory when a segment is closed (there’s no TTL cache), so the per-segment 
footprint matters a lot.
- Because the struct is 4x larger, many fewer entries fit into the CPU caches, 
which goes against the goal of making lookups more cache-friendly. Right now  
we get 4 entries per 64-byte cache line; with the padded layout we’d only get 1.
- during index load when server starts you cannot "just" load entries into 
memory, you need to prepare them to load them so that it's aligned with proper 
padding, it's not 1 :1 operation so server startup will be increased (not sure 
by how much)

Just to add - we don't store Vec<IggyIndex> in memory. We store Bytes (or 
Vec<u8>) which we cast to IggyIndexView:
https://github.com/apache/iggy/blob/a8e2341a40832767975c5edcc9b8b7b15e487876/core/common/src/types/message/index_view.rs#L27

>From my side you still have a green light to experiment with a faster 
>in-memory index representation 🦀 



GitHub link: 
https://github.com/apache/iggy/discussions/2381#discussioncomment-15016547

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to