Hi Vladimir,

We can solve "too many fsyncs" or 'too many small files' by placing several
partitions of cache group in one file.

We don't need to get rid from cache groups in this case.

It is not trivial task, but it is doable. We need to create simplest FS for
paritition chunks inside one file.

Sincerely,
Dmitriy Pavlov

вт, 10 апр. 2018 г. в 12:31, Vladimir Ozerov <voze...@gridgain.com>:

> Dima,
>
> 1) Easy to understand for users
> AI 2.x: cluster -> cache group -> cache -> table
> AI 3.x: cluster -> cache(==table)
>
> 2) Fine grained cache management
> - MVCC on/off per-cache
> - WAL mode on/off per-cache
> - Data size per-cache
>
> 3) Performance:
> - Efficient scans are not possible with cache groups
> - Efficient destroy/DROP - O(N) now, O(1) afterwards
>
> "Huge refactoring" is not precise estimate. Let's think on how to do that
> instead of how not to do :-)
>
> On Tue, Apr 10, 2018 at 11:41 AM, Dmitriy Setrakyan <dsetrak...@apache.org
> >
> wrote:
>
> > Vladimir, sounds like a huge refactoring. Other than "cache groups are
> > confusing", are we solving any other big issues with the new proposed
> > approach?
> >
> > (every time we try to refactor rebalancing, I get goose bumps)
> >
> > D.
> >
> > On Tue, Apr 10, 2018 at 1:32 AM, Vladimir Ozerov <voze...@gridgain.com>
> > wrote:
> >
> > > Igniters,
> > >
> > > Cache groups were implemented for a sole purpose - to hide internal
> > > inefficiencies. Namely (add more if I missed something):
> > > 1) Excessive heap usage for affinity/partition data
> > > 2) Too much data files as we employ file-per-partition approach.
> > >
> > > These problems were resolved, but now cache groups are a great source
> of
> > > confusion both for users and us - hard to understand, no way to
> configure
> > > it in deterministic way. Should we resolve mentioned performance issues
> > we
> > > would never had cache groups. I propose to think we would it take for
> us
> > to
> > > get rid of cache groups.
> > >
> > > Please provide your inputs to suggestions below.
> > >
> > > 1) "Merge" partition data from different caches
> > > Consider that we start a new cache with the same affinity configuration
> > > (cache mode, partition number, affinity function) as some of already
> > > existing caches, Is it possible to re-use partition distribution and
> > > history of existing cache for a new cache? Think of it as a kind of
> > > automatic cache grouping which is transparent to the user. This would
> > > remove heap pressure. Also it could resolve our long-standing issue
> with
> > > FairAffinityFunction when tow caches with the same affinity
> configuration
> > > are not co-located when started on different topology versions.
> > >
> > > 2) Employ segment-extent based approach instead of file-per-partition
> > > - Every object (cache, index) reside in dedicated segment
> > > - Segment consists of extents (minimal allocation units)
> > > - Extents are allocated and deallocated as needed
> > > - *Ignite specific*: particular extent can be used by only one
> partition
> > > - Segments may be located in any number of data files we find
> convenient
> > > With this approach "too many fsyncs" problem goes away automatically.
> At
> > > the same time it would be possible to implement efficient rebalance
> still
> > > as partition data will be split across moderate number of extents, not
> > > chaotically.
> > >
> > > Once we have p.1 and p.2 ready cache groups could be removed, couldn't
> > > they?
> > >
> > > Vladimir.
> > >
> >
>

Reply via email to