[ 
https://issues.apache.org/jira/browse/IGNITE-26272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksandr Polovtsev reassigned IGNITE-26272:
--------------------------------------------

    Assignee: Aleksandr Polovtsev

> Implement mem-table for incomplete segment files
> ------------------------------------------------
>
>                 Key: IGNITE-26272
>                 URL: https://issues.apache.org/jira/browse/IGNITE-26272
>             Project: Ignite
>          Issue Type: Improvement
>            Reporter: Ivan Bessonov
>            Assignee: Aleksandr Polovtsev
>            Priority: Major
>              Labels: ignite-3
>
> Here we need to have an in-memory structure for offset table of incomplete 
> segment files. In essence it should look like this:
> {code:java}
> MemTable = Map<
>   ReplicationGroupId,
>   IndexSegmentInfo
> >
> IndexSegmentInfo = {
>   flags
>   startIndex
>   offsets[]
> } {code}
> In code it might look something like this:
> {code:java}
> class IndexSegmentInfo {
>   volatile int flags;
>   final int startOffset;
>   volatile int[] offsets;
>   volatile int nextOffsetIndex;
> }{code}
> This structure has to be partitioned into stripes, because it's going to be 
> used in a disruptor pool and we want as little contention as possible. 
> Concurrent reads from it should still be possible, that's why there are all 
> these volatiles.
> This is also the place where we should probably start introducing 
> {{LogStorage}} implementations for individual raft groups, they may have a 
> direct access to their {{IndexSegmentInfo}} instances.
> Append to log storage should be followed with an append to this mem-table 
> synchronously in the same thread.
> During the segment switch there must be a way to wait for all stripes in a 
> different thread, in order to collect all mem-table for further asynchronous 
> processing (which itself is out of scope of this ticket).
> It can be expressed with this pseudo-code:
> {code:java}
> void append(logEntry) {
>   while (true) {
>     segment = currentSegment()
>     if (!segment.semaphore.acquire())
>       continue; // Retry, segment had switched in background.
>     try {
>       offset = segment.append(logEntry)
>       if (offset == SEGMENT_SWITCHED)
>         continue; // Retry.
>        segment.memTable.append(this, logEntry, offset)
>     } finally {
>       segment.semaphore.release()
>     }
>   }
> }{code}
> Of course, "{{{}memTable{}}}" doesn't have to be a field of 
> "{{{}segment{}}}", this is pseudo-code.
> In this example semaphore allows "{{{}stripes{}}}" concurrent accesses. We 
> can use a spin counter instead of a semaphore, or any other type of primitive 
> that won't cost much to acquire and release. This code is on a hot path.
> Doing the "{{{}segment.semaphore.acquireAll(){}}}" after segment switch 
> guarantees that all "{{{}segment.memTable.append(...){}}}" have finished 
> their execution, giving us a fast and safe solution.
> Of course, we should have concurrency tests for all these structures and 
> algorithms.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to