[jira] [Comment Edited] (IGNITE-6930) Optionally to do not write free list updates to WAL

Aleksey Plekhanov (Jira) Fri, 04 Oct 2019 00:10:18 -0700


    [ 
https://issues.apache.org/jira/browse/IGNITE-6930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16944281#comment-16944281
 ]


Aleksey Plekhanov edited comment on IGNITE-6930 at 10/4/19 7:09 AM:
--------------------------------------------------------------------

[~ivan.glukos],
 # The test assumes that PDS size didn't change between the first checkpoint 
and after several checkpoints. It's not true anymore with caching since the 
only final free-list state is persisted on checkpoint, some changed, but 
currently empty buckets are not persisted. So with caching PDS size in this 
test after the first checkpoint about 0.5 size of the original test, and after 
several checkpoints about 0.75 size of the original test.
 # This test checks that free-list works and pages cache flush correctly under 
the concurrent load. It helps me to catch a couple of concurrent bugs (these 
bugs have also reproduced by yardstick benchmark, but haven't reproduced by 
other tests on TC). I will add a comment about this.
 # I think they are too low level for some configuration files, but can be 
configured by system properties. I will change it.
 # I think 64 and 4 it's reasonable values. I've benchmarked with higher 
values, but it almost gives no performance boost. 8 (2 per bucket)- it's too 
small. There will be big overhead for service objects (at least 16 bytes per 
object, at least 3 objects: lock, GridLongList and arr inside GridLongList), so 
we will have 48 bytes for service objects per bucket and only 16 bytes (2 
longs) of useful data. 64/4 is a more reliable configuration since we allocate 
more heap space (16*8=128 bytes) for useful data than for service objects. 
Also, I think choosing MAX_SIZE dynamically, it's not such a good idea, since, 
there can be more than one node inside one JVM and we don't know when and how 
many nodes will be started when we start first one.
 # Ok, I will implement counter of empty flushed buckets.


was (Author: alex_pl):
# The test assumes that PDS size didn't change between the first checkpoint and 
after several checkpoints. It's not true anymore with caching since the only 
final free-list state is persisted on checkpoint, some changed, but currently 
empty buckets are not persisted. So with caching PDS size in this test after 
the first checkpoint about 0.5 size of the original test, and after several 
checkpoints about 0.75 size of the original test.
 # This test checks that free-list works and pages cache flush correctly under 
the concurrent load. It helps me to catch a couple of concurrent bugs (these 
bugs have also reproduced by yardstick benchmark, but haven't reproduced by 
other tests on TC). I will add a comment about this.
 # I think they are too low level for some configuration files, but can be 
configured by system properties. I will change it.
 # I think 64 and 4 it's reasonable values. I've benchmarked with higher 
values, but it almost gives no performance boost. 8 (2 per bucket)- it's too 
small. There will be big overhead for service objects (at least 16 bytes per 
object, at least 3 objects: lock, GridLongList and arr inside GridLongList), so 
we will have 48 bytes for service objects per bucket and only 16 bytes (2 
longs) of useful data. 64/4 is a more reliable configuration since we allocate 
more heap space (16*8=128 bytes) for useful data than for service objects. 
Also, I think choosing MAX_SIZE dynamically, it's not such a good idea, since, 
there can be more than one node inside one JVM and we don't know when and how 
many nodes will be started when we start first one.
 # Ok, I will implement counter of empty flushed buckets.

> Optionally to do not write free list updates to WAL
> ---------------------------------------------------
>
>                 Key: IGNITE-6930
>                 URL: https://issues.apache.org/jira/browse/IGNITE-6930
>             Project: Ignite
>          Issue Type: Task
>          Components: cache
>            Reporter: Vladimir Ozerov
>            Assignee: Aleksey Plekhanov
>            Priority: Major
>              Labels: IEP-8, performance
>             Fix For: 2.8
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> When cache entry is created, we need to write update the free list. When 
> entry is updated, we need to update free list(s) several times. Currently 
> free list is persistent structure, so every update to it must be logged to be 
> able to recover after crash. This may incur significant overhead, especially 
> for small entries.
> E.g. this is how WAL for a single update looks like. "D" - updates with real 
> data, "F" - free-list management:
> {code}
>  1. [D] DataRecord [writeEntries=[UnwrapDataEntry[k = key, v = [ BinaryObject 
> [idHash=2053299190, hash=1986931360, typeId=-1580729813]], super = [DataEntry 
> [cacheId=94416770, op=UPDATE, writeVer=GridCacheVersion [topVer=122147562, 
> order=1510667560607, nodeOrder=1], partId=0, partCnt=4]]]], super=WALRecord 
> [size=0, chainSize=0, pos=null, type=DATA_RECORD]]
>  2. [F] PagesListRemovePageRecord [rmvdPageId=0001000000000005, 
> pageId=0001000000000006, grpId=94416770, super=PageDeltaRecord 
> [grpId=94416770, pageId=0001000000000006, super=WALRecord [size=37, 
> chainSize=0, pos=null, type=PAGES_LIST_REMOVE_PAGE]]]
>  3. [D] DataPageInsertRecord [super=PageDeltaRecord [grpId=94416770, 
> pageId=0001000000000005, super=WALRecord [size=129, chainSize=0, pos=null, 
> type=DATA_PAGE_INSERT_RECORD]]]
>  4. [F] PagesListAddPageRecord [dataPageId=0001000000000005, 
> super=PageDeltaRecord [grpId=94416770, pageId=0001000000000008, 
> super=WALRecord [size=37, chainSize=0, pos=null, type=PAGES_LIST_ADD_PAGE]]]
>  5. [F] DataPageSetFreeListPageRecord [freeListPage=281474976710664, 
> super=PageDeltaRecord [grpId=94416770, pageId=0001000000000005, 
> super=WALRecord [size=37, chainSize=0, pos=null, 
> type=DATA_PAGE_SET_FREE_LIST_PAGE]]]
>  6. [D] ReplaceRecord [io=DataLeafIO[ver=1], idx=0, super=PageDeltaRecord 
> [grpId=94416770, pageId=0001000000000004, super=WALRecord [size=47, 
> chainSize=0, pos=null, type=BTREE_PAGE_REPLACE]]]
>  7. [F] DataPageRemoveRecord [itemId=0, super=PageDeltaRecord 
> [grpId=94416770, pageId=0001000000000005, super=WALRecord [size=30, 
> chainSize=0, pos=null, type=DATA_PAGE_REMOVE_RECORD]]]
>  8. [F] PagesListRemovePageRecord [rmvdPageId=0001000000000005, 
> pageId=0001000000000008, grpId=94416770, super=PageDeltaRecord 
> [grpId=94416770, pageId=0001000000000008, super=WALRecord [size=37, 
> chainSize=0, pos=null, type=PAGES_LIST_REMOVE_PAGE]]]
>  9. [F] DataPageSetFreeListPageRecord [freeListPage=0, super=PageDeltaRecord 
> [grpId=94416770, pageId=0001000000000005, super=WALRecord [size=37, 
> chainSize=0, pos=null, type=DATA_PAGE_SET_FREE_LIST_PAGE]]]
> 10. [F] PagesListAddPageRecord [dataPageId=0001000000000005, 
> super=PageDeltaRecord [grpId=94416770, pageId=0001000000000006, 
> super=WALRecord [size=37, chainSize=0, pos=null, type=PAGES_LIST_ADD_PAGE]]]
> 11. [F] DataPageSetFreeListPageRecord [freeListPage=281474976710662, 
> super=PageDeltaRecord [grpId=94416770, pageId=0001000000000005, 
> super=WALRecord [size=37, chainSize=0, pos=null, 
> type=DATA_PAGE_SET_FREE_LIST_PAGE]]]
> {code}
> If you sum all space required for operation (size in p.3 is shown incorrectly 
> here), you will see that data update required ~300 bytes, so do free list 
> update! 
> *Proposed solution*
> 1) Optionally do not write free list updates to WAL
> 2) In case of node restart we start with empty free lists, so data inserts 
> will have to allocate new pages
> 3) When old data page is read, add it to the free list
> 4) Start a background thread which will iterate over all old data pages and 
> re-create the free list, so that eventually all data pages are tracked.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (IGNITE-6930) Optionally to do not write free list updates to WAL

Reply via email to