On Tue, 5 Dec 2023 at 08:43, Heikki Linnakangas <hlinn...@iki.fi> wrote: > > On 01/11/2023 00:08, Matthias van de Meent wrote: > > By caching the right separator index tuple in _bt_search, we can > > compare the downlink's right separator and the HIKEY, and when they > > are equal (memcmp() == 0) we don't have to call _bt_compare - the > > HIKEY is known to be larger than the scan key, because our key is > > smaller than the right separator, and thus transitively also smaller > > than the HIKEY because it contains the same data. As _bt_compare can > > call expensive user-provided functions, this can be a large > > performance boon, especially when there are only a small number of > > column getting compared on each page (e.g. index tuples of many 100s > > of bytes, or dynamic prefix truncation is enabled). > > What would be the worst case scenario for this? One situation where the > memcmp() would not match is when there is a concurrent page split. I > think it's OK to pessimize that case. Are there any other situations?
There is also concurrent page deletion which can cause downlinked pages to get removed from the set of accessible pages, but that's quite rare, too: arguably even more rare than page splits. > When the memcmp() matches, I think this is almost certainly not slower > than calling the datatype's comparison function. > > > if (offnum < PageGetMaxOffsetNumber(page)) > > [...] > > else if (!P_RIGHTMOST(opaque)) > > [...] > > } > > This could use a one-line comment above this, something like "Remember > the right separator of the downlink we follow, to speed up the next > _bt_moveright call". Done. > Should there be an "else rightsep = NULL;" here? Is it possible that we > follow the non-rightmost downlink on a higher level and rightmost > downlink on next level? Concurrent page deletion? While possible, the worst this could do is be less efficient in those fringe cases: The cached right separator is a key that is known to compare larger than the search key and thus always correct to use as an optimization for "is this HIKEY larger than my search key", as long as we don't clobber the data in that cache (which we don't). Null-ing the argument, while not incorrect, could be argued to be worse than useless here, as the only case where NULL may match an actual highkey is on the rightmost page, which we already special-case in _bt_moveright before hitting the 'compare the highkey' code. Removal of the value would thus remove any chance of using the optimization after hitting the rightmost page in a layer below. I've added a comment to explain this in an empty else block in the attached version 2 of the patch. > Please update the comment above _bt_moveright to describe the new > argument. Perhaps the text from README should go there, this feels like > a detail specific to _bt_search and _bt_moveright. Done. Thank you for the review. Kind regards, Matthias van de Meent Neon (https://neon.tech)
v2-0001-btree-optimize-_bt_moveright-using-downlink-s-rig.patch
Description: Binary data