[ 
https://issues.apache.org/jira/browse/IGNITE-6930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16943607#comment-16943607
 ] 

Aleksey Plekhanov commented on IGNITE-6930:
-------------------------------------------

The patch is ready. To minimize WAL record I've used next approach:

There is a small on-heap pages list cache allocated for each bucket. There are 
three types of operations with free-lists: put the page to the tail of the 
bucket (after insert and remove row), take a page from the tail of the bucket 
(before insert row), remove the page from the bucket (before remove row), each 
of these operations first look into the pages cache, then work with page memory.

There is no WAL record needed if the page uses only buckets pages cache. So, 
it's possible then the page was put into free-list, moved through the bucket, 
leave the free list and hasn't produced any free-list WAL record at all.

On-heap pages cache is flushed to page memory before each checkpoint to ensure 
the same recovery guarantees as now (physical WAL records are restored from WAL 
only to the moment of the last unsuccessful checkpoint if it was started, so we 
need only final buckets state at the moment of checkpoint). 

[~ivan.glukos], could you please have a look?

 

> Optionally to do not write free list updates to WAL
> ---------------------------------------------------
>
>                 Key: IGNITE-6930
>                 URL: https://issues.apache.org/jira/browse/IGNITE-6930
>             Project: Ignite
>          Issue Type: Task
>          Components: cache
>            Reporter: Vladimir Ozerov
>            Assignee: Aleksey Plekhanov
>            Priority: Major
>              Labels: IEP-8, performance
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> When cache entry is created, we need to write update the free list. When 
> entry is updated, we need to update free list(s) several times. Currently 
> free list is persistent structure, so every update to it must be logged to be 
> able to recover after crash. This may incur significant overhead, especially 
> for small entries.
> E.g. this is how WAL for a single update looks like. "D" - updates with real 
> data, "F" - free-list management:
> {code}
>  1. [D] DataRecord [writeEntries=[UnwrapDataEntry[k = key, v = [ BinaryObject 
> [idHash=2053299190, hash=1986931360, typeId=-1580729813]], super = [DataEntry 
> [cacheId=94416770, op=UPDATE, writeVer=GridCacheVersion [topVer=122147562, 
> order=1510667560607, nodeOrder=1], partId=0, partCnt=4]]]], super=WALRecord 
> [size=0, chainSize=0, pos=null, type=DATA_RECORD]]
>  2. [F] PagesListRemovePageRecord [rmvdPageId=0001000000000005, 
> pageId=0001000000000006, grpId=94416770, super=PageDeltaRecord 
> [grpId=94416770, pageId=0001000000000006, super=WALRecord [size=37, 
> chainSize=0, pos=null, type=PAGES_LIST_REMOVE_PAGE]]]
>  3. [D] DataPageInsertRecord [super=PageDeltaRecord [grpId=94416770, 
> pageId=0001000000000005, super=WALRecord [size=129, chainSize=0, pos=null, 
> type=DATA_PAGE_INSERT_RECORD]]]
>  4. [F] PagesListAddPageRecord [dataPageId=0001000000000005, 
> super=PageDeltaRecord [grpId=94416770, pageId=0001000000000008, 
> super=WALRecord [size=37, chainSize=0, pos=null, type=PAGES_LIST_ADD_PAGE]]]
>  5. [F] DataPageSetFreeListPageRecord [freeListPage=281474976710664, 
> super=PageDeltaRecord [grpId=94416770, pageId=0001000000000005, 
> super=WALRecord [size=37, chainSize=0, pos=null, 
> type=DATA_PAGE_SET_FREE_LIST_PAGE]]]
>  6. [D] ReplaceRecord [io=DataLeafIO[ver=1], idx=0, super=PageDeltaRecord 
> [grpId=94416770, pageId=0001000000000004, super=WALRecord [size=47, 
> chainSize=0, pos=null, type=BTREE_PAGE_REPLACE]]]
>  7. [F] DataPageRemoveRecord [itemId=0, super=PageDeltaRecord 
> [grpId=94416770, pageId=0001000000000005, super=WALRecord [size=30, 
> chainSize=0, pos=null, type=DATA_PAGE_REMOVE_RECORD]]]
>  8. [F] PagesListRemovePageRecord [rmvdPageId=0001000000000005, 
> pageId=0001000000000008, grpId=94416770, super=PageDeltaRecord 
> [grpId=94416770, pageId=0001000000000008, super=WALRecord [size=37, 
> chainSize=0, pos=null, type=PAGES_LIST_REMOVE_PAGE]]]
>  9. [F] DataPageSetFreeListPageRecord [freeListPage=0, super=PageDeltaRecord 
> [grpId=94416770, pageId=0001000000000005, super=WALRecord [size=37, 
> chainSize=0, pos=null, type=DATA_PAGE_SET_FREE_LIST_PAGE]]]
> 10. [F] PagesListAddPageRecord [dataPageId=0001000000000005, 
> super=PageDeltaRecord [grpId=94416770, pageId=0001000000000006, 
> super=WALRecord [size=37, chainSize=0, pos=null, type=PAGES_LIST_ADD_PAGE]]]
> 11. [F] DataPageSetFreeListPageRecord [freeListPage=281474976710662, 
> super=PageDeltaRecord [grpId=94416770, pageId=0001000000000005, 
> super=WALRecord [size=37, chainSize=0, pos=null, 
> type=DATA_PAGE_SET_FREE_LIST_PAGE]]]
> {code}
> If you sum all space required for operation (size in p.3 is shown incorrectly 
> here), you will see that data update required ~300 bytes, so do free list 
> update! 
> *Proposed solution*
> 1) Optionally do not write free list updates to WAL
> 2) In case of node restart we start with empty free lists, so data inserts 
> will have to allocate new pages
> 3) When old data page is read, add it to the free list
> 4) Start a background thread which will iterate over all old data pages and 
> re-create the free list, so that eventually all data pages are tracked.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to