Ilya,

That's outstanding research and summary. Thanks for spending your time on
this.

Not sure I have enough expertise to challenge your approach, but it sounds
100% reasonable to me. As side notes:

   - Don't you want to aggregate the tickets under an IEP?
   - Does it mean we're going to update our B+Tree implementation? Any
   ideas how risky it is?
   - Any chance you had a prototype that shows performance optimizations of
   the approach you are suggesting to take?

--
Denis

On Thu, Mar 22, 2018 at 8:38 AM, Ilya Lantukh <ilant...@gridgain.com> wrote:

> Igniters,
>
> I've spent some time analyzing performance of rebalancing process. The
> initial goal was to understand, what limits it's throughput, because it is
> significantly slower than network and storage device can theoretically
> handle.
>
> Turns out, our current implementation has a number of issues caused by a
> single fundamental problem.
>
> During rebalance data is sent in batches called
> GridDhtPartitionSupplyMessages. Batch size is configurable, default value
> is 512KB, which could mean thousands of key-value pairs. However, we don't
> take any advantage over this fact and process each entry independently:
> - checkpointReadLock is acquired multiple times for every entry, leading to
> unnecessary contention - this is clearly a bug;
> - for each entry we write (and fsync, if configuration assumes it) a
> separate WAL record - so, if batch contains N entries, we might end up
> doing N fsyncs;
> - adding every entry into CacheDataStore also happens completely
> independently. It means, we will traverse and modify each index tree N
> times, we will allocate space in FreeList N times and we will have to
> additionally store in WAL O(N*log(N)) page delta records.
>
> I've created a few tickets in JIRA with very different levels of scale and
> complexity.
>
> Ways to reduce impact of independent processing:
> - https://issues.apache.org/jira/browse/IGNITE-8019 - aforementioned bug,
> causing contention on checkpointReadLock;
> - https://issues.apache.org/jira/browse/IGNITE-8018 - inefficiency in
> GridCacheMapEntry implementation;
> - https://issues.apache.org/jira/browse/IGNITE-8017 - automatically
> disable
> WAL during preloading.
>
> Ways to solve problem on more global level:
> - https://issues.apache.org/jira/browse/IGNITE-7935 - a ticket to
> introduce
> batch modification;
> - https://issues.apache.org/jira/browse/IGNITE-8020 - complete redesign of
> rebalancing process for persistent caches, based on file transfer.
>
> Everyone is welcome to criticize above ideas, suggest new ones or
> participate in implementation.
>
> --
> Best regards,
> Ilya
>

Reply via email to