Probably this should be allowed to do using public API, actually this is
same as manual rebalancing.

пт, 27 сент. 2019 г. в 17:40, Alexei Scherbakov <
alexey.scherbak...@gmail.com>:

> The poor man's solution for the problem would be stopping fragmented node
> and removing partition data, then starting it again allowing full state
> transfer already without deletes.
> Rinse and repeat for all owners.
>
> Anton Vinogradov, would this work for you as workaround ?
>
> чт, 19 сент. 2019 г. в 13:03, Anton Vinogradov <a...@apache.org>:
>
>> Alexey,
>>
>> Let's combine your and Ivan's proposals.
>>
>> >> vacuum command, which acquires exclusive table lock, so no concurrent
>> activities on the table are possible.
>> and
>> >> Could the problem be solved by stopping a node which needs to be
>> defragmented, clearing persistence files and restarting the node?
>> >> After rebalancing the node will receive all data back without
>> fragmentation.
>>
>> How about to have special partition state SHRINKING?
>> This state should mean that partition unavailable for reads and updates
>> but
>> should keep it's update-counters and should not be marked as lost, renting
>> or evicted.
>> At this state we able to iterate over the partition and apply it's entries
>> to another file in a compact way.
>> Indices should be updated during the copy-on-shrink procedure or at the
>> shrink completion.
>> Once shrank file is ready we should replace the original partition file
>> with it and mark it as MOVING which will start the historical rebalance.
>> Shrinking should be performed during the low activity periods, but even in
>> case we found that activity was high and historical rebalance is not
>> suitable we may just remove the file and use regular rebalance to restore
>> the partition (this will also lead to shrink).
>>
>> BTW, seems, we able to implement partition shrink in a cheap way.
>> We may just use rebalancing code to apply fat partition's entries to the
>> new file.
>> So, 3 stages here: local rebalance, indices update and global historical
>> rebalance.
>>
>> On Thu, Sep 19, 2019 at 11:43 AM Alexey Goncharuk <
>> alexey.goncha...@gmail.com> wrote:
>>
>> > Anton,
>> >
>> >
>> > > >>  The solution which Anton suggested does not look easy because it
>> will
>> > > most likely significantly hurt performance
>> > > Mostly agree here, but what drop do we expect? What price do we ready
>> to
>> > > pay?
>> > > Not sure, but seems some vendors ready to pay, for example, 5% drop
>> for
>> > > this.
>> >
>> > 5% may be a big drop for some use-cases, so I think we should look at
>> how
>> > to improve performance, not how to make it worse.
>> >
>> >
>> > >
>> > > >> it is hard to maintain a data structure to choose "page from
>> free-list
>> > > with enough space closest to the beginning of the file".
>> > > We can just split each free-list bucket to the couple and use first
>> for
>> > > pages in the first half of the file and the second for the last.
>> > > Only two buckets required here since, during the file shrink, first
>> > > bucket's window will be shrank too.
>> > > Seems, this give us the same price on put, just use the first bucket
>> in
>> > > case it's not empty.
>> > > Remove price (with merge) will be increased, of course.
>> > >
>> > > The compromise solution is to have priority put (to the first path of
>> the
>> > > file), with keeping removal as is, and schedulable per-page migration
>> for
>> > > the rest of the data during the low activity period.
>> > >
>> > Free lists are large and slow by themselves, it is expensive to
>> checkpoint
>> > and read them on start, so as a long-term solution I would look into
>> > removing them. Moreover, not sure if adding yet another background
>> process
>> > will improve the codebase reliability and simplicity.
>> >
>> > If we want to go the hard path, I would look at free page tracking
>> bitmap -
>> > a special bitmask page, where each page in an adjacent block is marked
>> as 0
>> > if it has free space more than a certain configurable threshold (say,
>> 80%)
>> > - free, and 1 if less (full). Some vendors have successfully implemented
>> > this approach, which looks much more promising, but harder to implement.
>> >
>> > --AG
>> >
>>
>
>
> --
>
> Best regards,
> Alexei Scherbakov
>


-- 

Best regards,
Alexei Scherbakov

Reply via email to