[ https://issues.apache.org/jira/browse/IGNITE-26272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Aleksandr Polovtsev reassigned IGNITE-26272: -------------------------------------------- Assignee: Aleksandr Polovtsev > Implement mem-table for incomplete segment files > ------------------------------------------------ > > Key: IGNITE-26272 > URL: https://issues.apache.org/jira/browse/IGNITE-26272 > Project: Ignite > Issue Type: Improvement > Reporter: Ivan Bessonov > Assignee: Aleksandr Polovtsev > Priority: Major > Labels: ignite-3 > > Here we need to have an in-memory structure for offset table of incomplete > segment files. In essence it should look like this: > {code:java} > MemTable = Map< > ReplicationGroupId, > IndexSegmentInfo > > > IndexSegmentInfo = { > flags > startIndex > offsets[] > } {code} > In code it might look something like this: > {code:java} > class IndexSegmentInfo { > volatile int flags; > final int startOffset; > volatile int[] offsets; > volatile int nextOffsetIndex; > }{code} > This structure has to be partitioned into stripes, because it's going to be > used in a disruptor pool and we want as little contention as possible. > Concurrent reads from it should still be possible, that's why there are all > these volatiles. > This is also the place where we should probably start introducing > {{LogStorage}} implementations for individual raft groups, they may have a > direct access to their {{IndexSegmentInfo}} instances. > Append to log storage should be followed with an append to this mem-table > synchronously in the same thread. > During the segment switch there must be a way to wait for all stripes in a > different thread, in order to collect all mem-table for further asynchronous > processing (which itself is out of scope of this ticket). > It can be expressed with this pseudo-code: > {code:java} > void append(logEntry) { > while (true) { > segment = currentSegment() > if (!segment.semaphore.acquire()) > continue; // Retry, segment had switched in background. > try { > offset = segment.append(logEntry) > if (offset == SEGMENT_SWITCHED) > continue; // Retry. > segment.memTable.append(this, logEntry, offset) > } finally { > segment.semaphore.release() > } > } > }{code} > Of course, "{{{}memTable{}}}" doesn't have to be a field of > "{{{}segment{}}}", this is pseudo-code. > In this example semaphore allows "{{{}stripes{}}}" concurrent accesses. We > can use a spin counter instead of a semaphore, or any other type of primitive > that won't cost much to acquire and release. This code is on a hot path. > Doing the "{{{}segment.semaphore.acquireAll(){}}}" after segment switch > guarantees that all "{{{}segment.memTable.append(...){}}}" have finished > their execution, giving us a fast and safe solution. > Of course, we should have concurrency tests for all these structures and > algorithms. -- This message was sent by Atlassian Jira (v8.20.10#820010)