On Fri, Aug 4, 2017 at 7:14 PM, Amit Kapila <[email protected]> wrote:
> On Sun, Jul 30, 2017 at 2:07 PM, Ashutosh Sharma <[email protected]>
> wrote:
>> Hi,
>>
>> On Wed, May 10, 2017 at 2:28 PM, Ashutosh Sharma <[email protected]>
>> wrote:
>>> While doing the code coverage testing of v7 patch shared with - [1], I
>>> found that there are few lines of code in _hash_next() that are
>>> redundant and needs to be removed. I basically came to know this while
>>> testing the scenario where a hash index scan starts when a split is in
>>> progress. I have removed those lines and attached is the newer version
>>> of patch.
>>>
>>
>> Please find the new version of patches for page at a time scan in hash
>> index rebased on top of latest commit in master branch. Also, i have
>> ran pgindent script with pg_bsd_indent version 2.0 on all the modified
>> files. Thanks.
>>
>
> Few comments:
Thanks for reviewing the patch.
> 1.
> +_hash_kill_items(IndexScanDesc scan, bool havePin)
>
> I think you can do without the second parameter. Can't we detect
> inside _hash_kill_items whether the page is pinned or not as we do for
> btree?
Okay, done that way. Please check attached v10 patch.
>
> 2.
> + /*
> + * We remember prev and next block number along with current block
> + * number so that if fetching the tup- les using cursor we know
> + * the page from where to startwith. This is for the case where we
> + * have re- ached the end of bucket chain without finding any
> + * matching tuples.
>
> The spelling of tuples and reached contain some unwanted symbol. Have
> space between "startwith" or just use "begin"
Corrected.
>
> 3.
> + if (!BlockNumberIsValid((opaque)->hasho_nextblkno))
> + {
> + so->currPos.prevPage = (opaque)->hasho_prevblkno;
> + so->currPos.nextPage = InvalidBlockNumber;
> + }
>
> There is no need to use Parentheses around opaque. I mean there is no
> problem with that, but it is redundant and makes code less readable.
> Also, there is similar usage at other places in the code, please
> change all another place as well.
Removed parenthesis around opaque.
I think you can save the value of
> prevblkno in a local variable and use it in else part.
Okay, done that way.
>
> 4.
> + if (so->hashso_sk_hash == _hash_get_indextuple_hashkey(itup) &&
> + _hash_checkqual(scan, itup))
> + {
> + /* tuple is qualified, so remember it */
> + _hash_saveitem(so, itemIndex, offnum, itup);
> + itemIndex++;
> + }
> + else
> +
> + /*
> + * No more matching tuples exist in this page. so, exit while
> + * loop.
> + */
> + break;
>
> It is better to have braces for the else part. It makes code look
> better. The type of code exists few line down as well, change that as
> well.
Added braces in the else part.
>
> 5.
> + /*
> + * We save the LSN of the page as we read it, so that we know whether it
> + * safe to apply LP_DEAD hints to the page later.
> + */
>
> "whether it safe"/"whether it is safe"
Corrected.
--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com
From 59e9a5f5afc31a3d14ae39bf5ae0cf21ee42f624 Mon Sep 17 00:00:00 2001
From: ashu <[email protected]>
Date: Mon, 7 Aug 2017 16:06:52 +0530
Subject: [PATCH] Rewrite hash index scan to work page at a time.
Patch by Ashutosh Sharma <[email protected]>
---
src/backend/access/hash/README | 25 +-
src/backend/access/hash/hash.c | 153 +++---------
src/backend/access/hash/hashpage.c | 10 +-
src/backend/access/hash/hashsearch.c | 446 +++++++++++++++++++++++++++++++----
src/backend/access/hash/hashutil.c | 71 +++++-
src/include/access/hash.h | 55 ++++-
6 files changed, 570 insertions(+), 190 deletions(-)
diff --git a/src/backend/access/hash/README b/src/backend/access/hash/README
index c8a0ec7..eef7d66 100644
--- a/src/backend/access/hash/README
+++ b/src/backend/access/hash/README
@@ -259,10 +259,11 @@ The reader algorithm is:
-- then, per read request:
reacquire content lock on current page
step to next page if necessary (no chaining of content locks, but keep
- the pin on the primary bucket throughout the scan; we also maintain
- a pin on the page currently being scanned)
- get tuple
- release content lock
+ the pin on the primary bucket throughout the scan)
+ save all the matching tuples from current index page into an items array
+ release pin and content lock (but if it is primary bucket page retain
+ it's pin till the end of scan)
+ get tuple from an item array
-- at scan shutdown:
release all pins still held
@@ -270,15 +271,13 @@ Holding the buffer pin on the primary bucket page for the whole scan prevents
the reader's current-tuple pointer from being invalidated by splits or
compactions. (Of course, other buckets can still be split or compacted.)
-To keep concurrency reasonably good, we require readers to cope with
-concurrent insertions, which means that they have to be able to re-find
-their current scan position after re-acquiring the buffer content lock on
-page. Since deletion is not possible while a reader holds the pin on bucket,
-and we assume that heap tuple TIDs are unique, this can be implemented by
-searching for the same heap tuple TID previously returned. Insertion does
-not move index entries across pages, so the previously-returned index entry
-should always be on the same page, at the same or higher offset number,
-as it was before.
+To minimize lock/unlock traffic, hash index scan always searches entire hash
+page to identify all the matching items at once, copying their heap tuple IDs
+into backend-local storage. The heap tuple IDs are then processed while not
+holding any page lock within the index thereby, allowing concurrent insertion
+to happen on a same index page without any requirement of re-finding the current
+scan position for reader. We do continue to hold a pin on the bucket page, to
+protect against concurrent deletions and bucket split.
To allow for scans during a bucket split, if at the start of the scan, the
bucket is marked as bucket-being-populated, it scan all the tuples in that
diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c
index d89c192..08bfd6a 100644
--- a/src/backend/access/hash/hash.c
+++ b/src/backend/access/hash/hash.c
@@ -268,66 +268,22 @@ bool
hashgettuple(IndexScanDesc scan, ScanDirection dir)
{
HashScanOpaque so = (HashScanOpaque) scan->opaque;
- Relation rel = scan->indexRelation;
- Buffer buf;
- Page page;
- OffsetNumber offnum;
- ItemPointer current;
bool res;
+ HashScanPosItem *currItem;
/* Hash indexes are always lossy since we store only the hash code */
scan->xs_recheck = true;
/*
- * We hold pin but not lock on current buffer while outside the hash AM.
- * Reacquire the read lock here.
- */
- if (BufferIsValid(so->hashso_curbuf))
- LockBuffer(so->hashso_curbuf, BUFFER_LOCK_SHARE);
-
- /*
* If we've already initialized this scan, we can just advance it in the
* appropriate direction. If we haven't done so yet, we call a routine to
* get the first item in the scan.
*/
- current = &(so->hashso_curpos);
- if (ItemPointerIsValid(current))
+ if (!HashScanPosIsValid(so->currPos))
+ res = _hash_first(scan, dir);
+ else
{
/*
- * An insertion into the current index page could have happened while
- * we didn't have read lock on it. Re-find our position by looking
- * for the TID we previously returned. (Because we hold a pin on the
- * primary bucket page, no deletions or splits could have occurred;
- * therefore we can expect that the TID still exists in the current
- * index page, at an offset >= where we were.)
- */
- OffsetNumber maxoffnum;
-
- buf = so->hashso_curbuf;
- Assert(BufferIsValid(buf));
- page = BufferGetPage(buf);
-
- /*
- * We don't need test for old snapshot here as the current buffer is
- * pinned, so vacuum can't clean the page.
- */
- maxoffnum = PageGetMaxOffsetNumber(page);
- for (offnum = ItemPointerGetOffsetNumber(current);
- offnum <= maxoffnum;
- offnum = OffsetNumberNext(offnum))
- {
- IndexTuple itup;
-
- itup = (IndexTuple) PageGetItem(page, PageGetItemId(page, offnum));
- if (ItemPointerEquals(&(so->hashso_heappos), &(itup->t_tid)))
- break;
- }
- if (offnum > maxoffnum)
- elog(ERROR, "failed to re-find scan position within index \"%s\"",
- RelationGetRelationName(rel));
- ItemPointerSetOffsetNumber(current, offnum);
-
- /*
* Check to see if we should kill the previously-fetched tuple.
*/
if (scan->kill_prior_tuple)
@@ -341,16 +297,11 @@ hashgettuple(IndexScanDesc scan, ScanDirection dir)
* entries.
*/
if (so->killedItems == NULL)
- so->killedItems = palloc(MaxIndexTuplesPerPage *
- sizeof(HashScanPosItem));
+ so->killedItems = (int *)
+ palloc(MaxIndexTuplesPerPage * sizeof(int));
if (so->numKilled < MaxIndexTuplesPerPage)
- {
- so->killedItems[so->numKilled].heapTid = so->hashso_heappos;
- so->killedItems[so->numKilled].indexOffset =
- ItemPointerGetOffsetNumber(&(so->hashso_curpos));
- so->numKilled++;
- }
+ so->killedItems[so->numKilled++] = so->currPos.itemIndex;
}
/*
@@ -358,30 +309,10 @@ hashgettuple(IndexScanDesc scan, ScanDirection dir)
*/
res = _hash_next(scan, dir);
}
- else
- res = _hash_first(scan, dir);
-
- /*
- * Skip killed tuples if asked to.
- */
- if (scan->ignore_killed_tuples)
- {
- while (res)
- {
- offnum = ItemPointerGetOffsetNumber(current);
- page = BufferGetPage(so->hashso_curbuf);
- if (!ItemIdIsDead(PageGetItemId(page, offnum)))
- break;
- res = _hash_next(scan, dir);
- }
- }
-
- /* Release read lock on current buffer, but keep it pinned */
- if (BufferIsValid(so->hashso_curbuf))
- LockBuffer(so->hashso_curbuf, BUFFER_LOCK_UNLOCK);
/* Return current heap TID on success */
- scan->xs_ctup.t_self = so->hashso_heappos;
+ currItem = &so->currPos.items[so->currPos.itemIndex];
+ scan->xs_ctup.t_self = currItem->heapTid;
return res;
}
@@ -396,35 +327,21 @@ hashgetbitmap(IndexScanDesc scan, TIDBitmap *tbm)
HashScanOpaque so = (HashScanOpaque) scan->opaque;
bool res;
int64 ntids = 0;
+ HashScanPosItem *currItem;
res = _hash_first(scan, ForwardScanDirection);
while (res)
{
- bool add_tuple;
+ currItem = &so->currPos.items[so->currPos.itemIndex];
/*
- * Skip killed tuples if asked to.
+ * _hash_first() or _hash_next() never returns dead tuples. Therefore,
+ * we can always add the tuples into TIDBitmap without checking if a
+ * tuple is dead or not.
*/
- if (scan->ignore_killed_tuples)
- {
- Page page;
- OffsetNumber offnum;
-
- offnum = ItemPointerGetOffsetNumber(&(so->hashso_curpos));
- page = BufferGetPage(so->hashso_curbuf);
- add_tuple = !ItemIdIsDead(PageGetItemId(page, offnum));
- }
- else
- add_tuple = true;
-
- /* Save tuple ID, and continue scanning */
- if (add_tuple)
- {
- /* Note we mark the tuple ID as requiring recheck */
- tbm_add_tuples(tbm, &(so->hashso_heappos), 1, true);
- ntids++;
- }
+ tbm_add_tuples(tbm, &(currItem->heapTid), 1, true);
+ ntids++;
res = _hash_next(scan, ForwardScanDirection);
}
@@ -448,12 +365,9 @@ hashbeginscan(Relation rel, int nkeys, int norderbys)
scan = RelationGetIndexScan(rel, nkeys, norderbys);
so = (HashScanOpaque) palloc(sizeof(HashScanOpaqueData));
- so->hashso_curbuf = InvalidBuffer;
+ HashScanPosInvalidate(so->currPos);
so->hashso_bucket_buf = InvalidBuffer;
so->hashso_split_bucket_buf = InvalidBuffer;
- /* set position invalid (this will cause _hash_first call) */
- ItemPointerSetInvalid(&(so->hashso_curpos));
- ItemPointerSetInvalid(&(so->hashso_heappos));
so->hashso_buc_populated = false;
so->hashso_buc_split = false;
@@ -476,22 +390,16 @@ hashrescan(IndexScanDesc scan, ScanKey scankey, int nscankeys,
HashScanOpaque so = (HashScanOpaque) scan->opaque;
Relation rel = scan->indexRelation;
- /*
- * Before leaving current page, deal with any killed items. Also, ensure
- * that we acquire lock on current page before calling _hash_kill_items.
- */
- if (so->numKilled > 0)
+ if (HashScanPosIsValid(so->currPos))
{
- LockBuffer(so->hashso_curbuf, BUFFER_LOCK_SHARE);
- _hash_kill_items(scan);
- LockBuffer(so->hashso_curbuf, BUFFER_LOCK_UNLOCK);
+ /* Before leaving current page, deal with any killed items */
+ if (so->numKilled > 0)
+ _hash_kill_items(scan);
+ _hash_dropscanbuf(rel, so);
}
- _hash_dropscanbuf(rel, so);
-
/* set position invalid (this will cause _hash_first call) */
- ItemPointerSetInvalid(&(so->hashso_curpos));
- ItemPointerSetInvalid(&(so->hashso_heappos));
+ HashScanPosInvalidate(so->currPos);
/* Update scan key, if a new one is given */
if (scankey && scan->numberOfKeys > 0)
@@ -514,19 +422,14 @@ hashendscan(IndexScanDesc scan)
HashScanOpaque so = (HashScanOpaque) scan->opaque;
Relation rel = scan->indexRelation;
- /*
- * Before leaving current page, deal with any killed items. Also, ensure
- * that we acquire lock on current page before calling _hash_kill_items.
- */
- if (so->numKilled > 0)
+ if (HashScanPosIsValid(so->currPos))
{
- LockBuffer(so->hashso_curbuf, BUFFER_LOCK_SHARE);
- _hash_kill_items(scan);
- LockBuffer(so->hashso_curbuf, BUFFER_LOCK_UNLOCK);
+ /* Before leaving current page, deal with any killed items */
+ if (so->numKilled > 0)
+ _hash_kill_items(scan);
+ _hash_dropscanbuf(rel, so);
}
- _hash_dropscanbuf(rel, so);
-
if (so->killedItems != NULL)
pfree(so->killedItems);
pfree(so);
diff --git a/src/backend/access/hash/hashpage.c b/src/backend/access/hash/hashpage.c
index 08eaf1d..7a02e94 100644
--- a/src/backend/access/hash/hashpage.c
+++ b/src/backend/access/hash/hashpage.c
@@ -298,20 +298,20 @@ _hash_dropscanbuf(Relation rel, HashScanOpaque so)
{
/* release pin we hold on primary bucket page */
if (BufferIsValid(so->hashso_bucket_buf) &&
- so->hashso_bucket_buf != so->hashso_curbuf)
+ so->hashso_bucket_buf != so->currPos.buf)
_hash_dropbuf(rel, so->hashso_bucket_buf);
so->hashso_bucket_buf = InvalidBuffer;
/* release pin we hold on primary bucket page of bucket being split */
if (BufferIsValid(so->hashso_split_bucket_buf) &&
- so->hashso_split_bucket_buf != so->hashso_curbuf)
+ so->hashso_split_bucket_buf != so->currPos.buf)
_hash_dropbuf(rel, so->hashso_split_bucket_buf);
so->hashso_split_bucket_buf = InvalidBuffer;
/* release any pin we still hold */
- if (BufferIsValid(so->hashso_curbuf))
- _hash_dropbuf(rel, so->hashso_curbuf);
- so->hashso_curbuf = InvalidBuffer;
+ if (BufferIsValid(so->currPos.buf))
+ _hash_dropbuf(rel, so->currPos.buf);
+ so->currPos.buf = InvalidBuffer;
/* reset split scan */
so->hashso_buc_populated = false;
diff --git a/src/backend/access/hash/hashsearch.c b/src/backend/access/hash/hashsearch.c
index 3e461ad..f4408ab 100644
--- a/src/backend/access/hash/hashsearch.c
+++ b/src/backend/access/hash/hashsearch.c
@@ -20,44 +20,108 @@
#include "pgstat.h"
#include "utils/rel.h"
+static bool _hash_readpage(IndexScanDesc scan, Buffer *bufP,
+ ScanDirection dir);
+static int _hash_load_qualified_items(IndexScanDesc scan, Page page,
+ OffsetNumber offnum, ScanDirection dir);
+static inline void _hash_saveitem(HashScanOpaque so, int itemIndex,
+ OffsetNumber offnum, IndexTuple itup);
+static void _hash_readnext(IndexScanDesc scan, Buffer *bufp,
+ Page *pagep, HashPageOpaque *opaquep);
/*
* _hash_next() -- Get the next item in a scan.
*
- * On entry, we have a valid hashso_curpos in the scan, and a
- * pin and read lock on the page that contains that item.
- * We find the next item in the scan, if any.
- * On success exit, we have the page containing the next item
- * pinned and locked.
+ * On entry, so->currPos describes the current page, which may
+ * be pinned but not locked, and so->currPos.itemIndex identifies
+ * which item was previously returned.
+ *
+ * On successful exit, scan->xs_ctup.t_self is set to the TID
+ * of the next heap tuple, and if requested, scan->xs_itup
+ * points to a copy of the index tuple. so->currPos is updated
+ * as needed.
+ *
+ * On failure exit (no more tuples), we return FALSE with no
+ * pins or locks held.
*/
bool
_hash_next(IndexScanDesc scan, ScanDirection dir)
{
Relation rel = scan->indexRelation;
HashScanOpaque so = (HashScanOpaque) scan->opaque;
+ HashScanPosItem *currItem;
+ BlockNumber blkno;
Buffer buf;
- Page page;
- OffsetNumber offnum;
- ItemPointer current;
- IndexTuple itup;
-
- /* we still have the buffer pinned and read-locked */
- buf = so->hashso_curbuf;
- Assert(BufferIsValid(buf));
+ bool end_of_scan = false;
/*
- * step to next valid tuple.
+ * Advance to next tuple on current page; or if there's no more, try to
+ * read data from next or prev page based on the scan direction. Before
+ * moving to the next or prev page make sure that we deal with all the
+ * killed items.
*/
- if (!_hash_step(scan, &buf, dir))
+ if (ScanDirectionIsForward(dir))
+ {
+ if (++so->currPos.itemIndex > so->currPos.lastItem)
+ {
+ if (so->numKilled > 0)
+ _hash_kill_items(scan);
+
+ blkno = so->currPos.nextPage;
+ if (BlockNumberIsValid(blkno))
+ {
+ buf = _hash_getbuf(rel, blkno, HASH_READ, LH_OVERFLOW_PAGE);
+ so->currPos.buf = buf;
+ TestForOldSnapshot(scan->xs_snapshot, rel, BufferGetPage(buf));
+ if (!_hash_readpage(scan, &buf, dir))
+ end_of_scan = true;
+ }
+ else
+ end_of_scan = true;
+ }
+ }
+ else
+ {
+ if (--so->currPos.itemIndex < so->currPos.firstItem)
+ {
+ if (so->numKilled > 0)
+ _hash_kill_items(scan);
+
+ blkno = so->currPos.prevPage;
+ if (BlockNumberIsValid(blkno))
+ {
+ buf = _hash_getbuf(rel, blkno, HASH_READ,
+ LH_BUCKET_PAGE | LH_OVERFLOW_PAGE);
+ so->currPos.buf = buf;
+ TestForOldSnapshot(scan->xs_snapshot, rel, BufferGetPage(buf));
+
+ /*
+ * We always maintain the pin on bucket page for whole scan
+ * operation, so releasing the additional pin we have acquired
+ * here.
+ */
+ if (buf == so->hashso_bucket_buf ||
+ buf == so->hashso_split_bucket_buf)
+ _hash_dropbuf(rel, buf);
+
+ if (!_hash_readpage(scan, &buf, dir))
+ end_of_scan = true;
+ }
+ else
+ end_of_scan = true;
+ }
+ }
+
+ if (end_of_scan)
+ {
+ _hash_dropscanbuf(rel, so);
+ HashScanPosInvalidate(so->currPos);
return false;
+ }
- /* if we're here, _hash_step found a valid tuple */
- current = &(so->hashso_curpos);
- offnum = ItemPointerGetOffsetNumber(current);
- _hash_checkpage(rel, buf, LH_BUCKET_PAGE | LH_OVERFLOW_PAGE);
- page = BufferGetPage(buf);
- itup = (IndexTuple) PageGetItem(page, PageGetItemId(page, offnum));
- so->hashso_heappos = itup->t_tid;
+ /* OK, itemIndex says what to return */
+ currItem = &so->currPos.items[so->currPos.itemIndex];
+ scan->xs_ctup.t_self = currItem->heapTid;
return true;
}
@@ -212,11 +276,15 @@ _hash_readprev(IndexScanDesc scan,
/*
* _hash_first() -- Find the first item in a scan.
*
- * Find the first item in the index that
- * satisfies the qualification associated with the scan descriptor. On
- * success, the page containing the current index tuple is read locked
- * and pinned, and the scan's opaque data entry is updated to
- * include the buffer.
+ * We find the first item (or, if backward scan, the last item) in
+ * the index that satisfies the qualification associated with the
+ * scan descriptor. On success, the page containing the current
+ * index tuple is read locked and pinned, and data about the
+ * matching tuple(s) on the page has been loaded into so->currPos,
+ * scan->xs_ctup.t_self is set to the heap TID of the current tuple.
+ *
+ * If there are no matching items in the index, we return FALSE,
+ * with no pins or locks held.
*/
bool
_hash_first(IndexScanDesc scan, ScanDirection dir)
@@ -229,15 +297,9 @@ _hash_first(IndexScanDesc scan, ScanDirection dir)
Buffer buf;
Page page;
HashPageOpaque opaque;
- IndexTuple itup;
- ItemPointer current;
- OffsetNumber offnum;
pgstat_count_index_scan(rel);
- current = &(so->hashso_curpos);
- ItemPointerSetInvalid(current);
-
/*
* We do not support hash scans with no index qualification, because we
* would have to read the whole index rather than just one bucket. That
@@ -356,17 +418,15 @@ _hash_first(IndexScanDesc scan, ScanDirection dir)
_hash_readnext(scan, &buf, &page, &opaque);
}
- /* Now find the first tuple satisfying the qualification */
- if (!_hash_step(scan, &buf, dir))
- return false;
+ /* remember which buffer we have pinned, if any */
+ Assert(BufferIsInvalid(so->currPos.buf));
+ so->currPos.buf = buf;
- /* if we're here, _hash_step found a valid tuple */
- offnum = ItemPointerGetOffsetNumber(current);
- _hash_checkpage(rel, buf, LH_BUCKET_PAGE | LH_OVERFLOW_PAGE);
- page = BufferGetPage(buf);
- itup = (IndexTuple) PageGetItem(page, PageGetItemId(page, offnum));
- so->hashso_heappos = itup->t_tid;
+ /* Now find all the tuples satisfying the qualification from a page */
+ if (!_hash_readpage(scan, &buf, dir))
+ return false;
+ /* if we're here, _hash_readpage found a valid tuples */
return true;
}
@@ -575,3 +635,305 @@ _hash_step(IndexScanDesc scan, Buffer *bufP, ScanDirection dir)
ItemPointerSet(current, blkno, offnum);
return true;
}
+
+/*
+ * _hash_readpage() -- Load data from current index page into so->currPos
+ *
+ * We scan all the items in the current index page and save them into
+ * so->currPos if it satifies the qualification. If no matching items
+ * are found in the current page, we move to the next or previous page
+ * in a bucket chain as indicated by the direction.
+ *
+ * Return true if any matching items are found else return false.
+ */
+static bool
+_hash_readpage(IndexScanDesc scan, Buffer *bufP, ScanDirection dir)
+{
+ Relation rel = scan->indexRelation;
+ HashScanOpaque so = (HashScanOpaque) scan->opaque;
+ Buffer buf;
+ Page page;
+ HashPageOpaque opaque;
+ OffsetNumber offnum;
+ uint16 itemIndex;
+
+ so->currPos.currPage = BufferGetBlockNumber(so->currPos.buf);
+
+ buf = *bufP;
+ Assert(BufferIsValid(buf));
+ _hash_checkpage(rel, buf, LH_BUCKET_PAGE | LH_OVERFLOW_PAGE);
+ page = BufferGetPage(buf);
+ opaque = (HashPageOpaque) PageGetSpecialPointer(page);
+
+ /*
+ * We save the LSN of the page as we read it, so that we know whether it
+ * is safe to apply LP_DEAD hints to the page later.
+ */
+ so->currPos.lsn = PageGetLSN(page);
+
+ if (ScanDirectionIsForward(dir))
+ {
+ BlockNumber prev_blkno = InvalidBlockNumber;
+
+ /* new page, locate starting position by binary search */
+ offnum = _hash_binsearch(page, so->hashso_sk_hash);
+
+ itemIndex = _hash_load_qualified_items(scan, page, offnum, dir);
+
+ while (itemIndex == 0)
+ {
+ /*
+ * Could not find any matching tuples in the current page, move to
+ * the next page. Before leaving the current page, also deal with
+ * any killed items.
+ */
+ if (so->numKilled > 0)
+ _hash_kill_items(scan);
+
+ /*
+ * We remember prev and next block number along with current block
+ * number so that if fetching the tuples using cursor we know the
+ * page from where to begin. This is for the case where we have
+ * reached the end of bucket chain without finding any matching
+ * tuples.
+ */
+ if (!BlockNumberIsValid(opaque->hasho_nextblkno))
+ prev_blkno = opaque->hasho_prevblkno;
+
+ _hash_readnext(scan, &buf, &page, &opaque);
+ if (BufferIsValid(buf))
+ {
+ so->currPos.buf = buf;
+ so->currPos.currPage = BufferGetBlockNumber(buf);
+ so->currPos.lsn = PageGetLSN(page);
+ offnum = _hash_binsearch(page, so->hashso_sk_hash);
+ itemIndex = _hash_load_qualified_items(scan, page,
+ offnum, dir);
+ }
+ else
+ {
+ /*
+ * No more matching tuples were found. return FALSE indicating
+ * the same.
+ */
+ so->currPos.prevPage = prev_blkno;
+ so->currPos.nextPage = InvalidBlockNumber;
+ so->currPos.buf = buf;
+ return false;
+ }
+ }
+
+ if (so->currPos.buf == so->hashso_bucket_buf ||
+ so->currPos.buf == so->hashso_split_bucket_buf)
+ {
+ so->currPos.prevPage = InvalidBlockNumber;
+ so->currPos.nextPage = opaque->hasho_nextblkno;
+ LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+ }
+ else
+ {
+ so->currPos.prevPage = opaque->hasho_prevblkno;
+ so->currPos.nextPage = opaque->hasho_nextblkno;
+ _hash_relbuf(rel, so->currPos.buf);
+ so->currPos.buf = InvalidBuffer;
+ }
+
+ so->currPos.firstItem = 0;
+ so->currPos.lastItem = itemIndex - 1;
+ so->currPos.itemIndex = 0;
+ }
+ else
+ {
+ BlockNumber next_blkno = InvalidBlockNumber;
+
+ /* new page, locate starting position by binary search */
+ offnum = _hash_binsearch_last(page, so->hashso_sk_hash);
+
+ itemIndex = _hash_load_qualified_items(scan, page, offnum, dir);
+
+ while (itemIndex == MaxIndexTuplesPerPage)
+ {
+
+ /*
+ * Could not find any matching tuples in the current page, move to
+ * the prev page. Before leaving the current page, also deal with
+ * any killed items.
+ */
+ if (so->numKilled > 0)
+ _hash_kill_items(scan);
+
+ /*
+ * We remember prev and next block number along with current block
+ * number so that if fetching the tuples using cursor we know the
+ * page from where to begin. This is for the case where we have
+ * reached the bucket page without finding any matching tuples.
+ */
+ if (so->currPos.buf == so->hashso_bucket_buf ||
+ so->currPos.buf == so->hashso_split_bucket_buf)
+ next_blkno = opaque->hasho_nextblkno;
+
+ _hash_readprev(scan, &buf, &page, &opaque);
+ if (BufferIsValid(buf))
+ {
+ so->currPos.buf = buf;
+ so->currPos.currPage = BufferGetBlockNumber(buf);
+ so->currPos.lsn = PageGetLSN(page);
+ offnum = _hash_binsearch_last(page, so->hashso_sk_hash);
+ itemIndex = _hash_load_qualified_items(scan, page,
+ offnum, dir);
+ }
+ else
+ {
+ /*
+ * No more matching tuples were found. return FALSE indicating
+ * the same.
+ */
+ so->currPos.prevPage = InvalidBlockNumber;
+ so->currPos.nextPage = next_blkno;
+ so->currPos.buf = buf;
+ return false;
+ }
+ }
+
+ if (so->currPos.buf == so->hashso_bucket_buf ||
+ so->currPos.buf == so->hashso_split_bucket_buf)
+ {
+ so->currPos.prevPage = InvalidBlockNumber;
+ so->currPos.nextPage = opaque->hasho_nextblkno;
+ LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+ }
+ else
+ {
+ so->currPos.prevPage = opaque->hasho_prevblkno;
+ so->currPos.nextPage = opaque->hasho_nextblkno;
+ _hash_relbuf(rel, so->currPos.buf);
+ so->currPos.buf = InvalidBuffer;
+ }
+
+ so->currPos.firstItem = itemIndex;
+ so->currPos.lastItem = MaxIndexTuplesPerPage - 1;
+ so->currPos.itemIndex = MaxIndexTuplesPerPage - 1;
+ }
+
+ return (so->currPos.firstItem <= so->currPos.lastItem);
+}
+
+/*
+ * Load all the qualified items from a current index page
+ * into so->currPos. Helper function for _hash_readpage.
+ */
+static int
+_hash_load_qualified_items(IndexScanDesc scan, Page page,
+ OffsetNumber offnum, ScanDirection dir)
+{
+ HashScanOpaque so = (HashScanOpaque) scan->opaque;
+ IndexTuple itup;
+ int itemIndex;
+ OffsetNumber maxoff;
+
+ maxoff = PageGetMaxOffsetNumber(page);
+
+ if (ScanDirectionIsForward(dir))
+ {
+ /* load items[] in ascending order */
+ itemIndex = 0;
+
+ while (offnum <= maxoff)
+ {
+ Assert(offnum >= FirstOffsetNumber);
+ itup = (IndexTuple) PageGetItem(page, PageGetItemId(page, offnum));
+
+ /*
+ * skip the tuples that are moved by split operation for the scan
+ * that has started when split was in progress. Also, skip the
+ * tuples that are marked as dead.
+ */
+ if ((so->hashso_buc_populated && !so->hashso_buc_split &&
+ (itup->t_info & INDEX_MOVED_BY_SPLIT_MASK)) ||
+ (scan->ignore_killed_tuples &&
+ (ItemIdIsDead(PageGetItemId(page, offnum)))))
+ {
+ offnum = OffsetNumberNext(offnum); /* move forward */
+ continue;
+ }
+
+ if (so->hashso_sk_hash == _hash_get_indextuple_hashkey(itup) &&
+ _hash_checkqual(scan, itup))
+ {
+ /* tuple is qualified, so remember it */
+ _hash_saveitem(so, itemIndex, offnum, itup);
+ itemIndex++;
+ }
+ else
+ {
+ /*
+ * No more matching tuples exist in this page. so, exit while
+ * loop.
+ */
+ break;
+ }
+
+ offnum = OffsetNumberNext(offnum);
+ }
+
+ Assert(itemIndex <= MaxIndexTuplesPerPage);
+ return itemIndex;
+ }
+ else
+ {
+ /* load items[] in descending order */
+ itemIndex = MaxIndexTuplesPerPage;
+
+ while (offnum >= FirstOffsetNumber)
+ {
+ Assert(offnum <= maxoff);
+ itup = (IndexTuple) PageGetItem(page, PageGetItemId(page, offnum));
+
+ /*
+ * skip the tuples that are moved by split operation for the scan
+ * that has started when split was in progress. Also, skip the
+ * tuples that are marked as dead.
+ */
+ if ((so->hashso_buc_populated && !so->hashso_buc_split &&
+ (itup->t_info & INDEX_MOVED_BY_SPLIT_MASK)) ||
+ (scan->ignore_killed_tuples &&
+ (ItemIdIsDead(PageGetItemId(page, offnum)))))
+ {
+ offnum = OffsetNumberPrev(offnum); /* move back */
+ continue;
+ }
+
+ if (so->hashso_sk_hash == _hash_get_indextuple_hashkey(itup) &&
+ _hash_checkqual(scan, itup))
+ {
+ itemIndex--;
+ /* tuple is qualified, so remember it */
+ _hash_saveitem(so, itemIndex, offnum, itup);
+ }
+ else
+ {
+ /*
+ * No more matching tuples exist in this page. so, exit while
+ * loop.
+ */
+ break;
+ }
+
+ offnum = OffsetNumberPrev(offnum);
+ }
+
+ Assert(itemIndex >= 0);
+ return itemIndex;
+ }
+}
+
+/* Save an index item into so->currPos.items[itemIndex] */
+static inline void
+_hash_saveitem(HashScanOpaque so, int itemIndex,
+ OffsetNumber offnum, IndexTuple itup)
+{
+ HashScanPosItem *currItem = &so->currPos.items[itemIndex];
+
+ currItem->heapTid = itup->t_tid;
+ currItem->indexOffset = offnum;
+}
diff --git a/src/backend/access/hash/hashutil.c b/src/backend/access/hash/hashutil.c
index 9b803af..bbc4296 100644
--- a/src/backend/access/hash/hashutil.c
+++ b/src/backend/access/hash/hashutil.c
@@ -522,13 +522,28 @@ _hash_get_newbucket_from_oldbucket(Relation rel, Bucket old_bucket,
* current page and killed tuples thereon (generally, this should only be
* called if so->numKilled > 0).
*
+ * The caller does not have a lock on the page and may or may not have the
+ * page pinned in a buffer. Note that read-lock is sufficient for setting
+ * LP_DEAD status (which is only a hint).
+ *
+ * The caller must have pin on bucket buffer, but may or may not have pin
+ * on overflow buffer, as indicated by HashScanPosIsPinned(so->currPos).
+ *
* We match items by heap TID before assuming they are the right ones to
* delete.
+ *
+ * Note that we keep pin on the bucket page throughout the scan. Hence,
+ * there is no chance of VACUUM deleting any items from the page.
+ *
+ * See _bt_killitems() for more details.
*/
void
_hash_kill_items(IndexScanDesc scan)
{
HashScanOpaque so = (HashScanOpaque) scan->opaque;
+ Relation rel = scan->indexRelation;
+ BlockNumber blkno;
+ Buffer buf;
Page page;
HashPageOpaque opaque;
OffsetNumber offnum,
@@ -536,9 +551,11 @@ _hash_kill_items(IndexScanDesc scan)
int numKilled = so->numKilled;
int i;
bool killedsomething = false;
+ bool havePin = false;
Assert(so->numKilled > 0);
Assert(so->killedItems != NULL);
+ Assert(HashScanPosIsValid(so->currPos));
/*
* Always reset the scan state, so we don't look for same items on other
@@ -546,20 +563,60 @@ _hash_kill_items(IndexScanDesc scan)
*/
so->numKilled = 0;
- page = BufferGetPage(so->hashso_curbuf);
+ blkno = so->currPos.currPage;
+ if (so->hashso_bucket_buf == so->currPos.buf ||
+ HashScanPosIsPinned(so->currPos))
+ {
+ /*
+ * We already have pin on this buffer, so, all we need to do is
+ * acquire lock on it. The pin would have prevented re-use of any TID
+ * on the page, so there is no need to check the LSN.
+ */
+ havePin = true;
+ buf = so->currPos.buf;
+ LockBuffer(buf, BUFFER_LOCK_SHARE);
+ page = BufferGetPage(buf);
+ }
+ else
+ {
+ buf = _hash_getbuf(rel, blkno, HASH_READ, LH_OVERFLOW_PAGE);
+
+ /* It might not exist anymore; in which case we can't hint it. */
+ if (!BufferIsValid(buf))
+ return;
+
+ /*
+ * If page LSN differs it means that the page was modified since the
+ * last read. killedItems could be not valid so LP_DEAD hints apply-
+ * ing is not safe.
+ */
+ page = BufferGetPage(buf);
+ if (PageGetLSN(page) != so->currPos.lsn)
+ {
+ _hash_relbuf(rel, buf);
+ return;
+ }
+ }
+
opaque = (HashPageOpaque) PageGetSpecialPointer(page);
maxoff = PageGetMaxOffsetNumber(page);
for (i = 0; i < numKilled; i++)
{
- offnum = so->killedItems[i].indexOffset;
+ int itemIndex = so->killedItems[i];
+ HashScanPosItem *currItem = &so->currPos.items[itemIndex];
+
+ offnum = currItem->indexOffset;
+
+ Assert(itemIndex >= so->currPos.firstItem &&
+ itemIndex <= so->currPos.lastItem);
while (offnum <= maxoff)
{
ItemId iid = PageGetItemId(page, offnum);
IndexTuple ituple = (IndexTuple) PageGetItem(page, iid);
- if (ItemPointerEquals(&ituple->t_tid, &so->killedItems[i].heapTid))
+ if (ItemPointerEquals(&ituple->t_tid, &currItem->heapTid))
{
/* found the item */
ItemIdMarkDead(iid);
@@ -578,6 +635,12 @@ _hash_kill_items(IndexScanDesc scan)
if (killedsomething)
{
opaque->hasho_flag |= LH_PAGE_HAS_DEAD_TUPLES;
- MarkBufferDirtyHint(so->hashso_curbuf, true);
+ MarkBufferDirtyHint(buf, true);
}
+
+ if (so->hashso_bucket_buf == so->currPos.buf ||
+ havePin)
+ LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
+ else
+ _hash_relbuf(rel, buf);
}
diff --git a/src/include/access/hash.h b/src/include/access/hash.h
index 72fce30..3e90b89 100644
--- a/src/include/access/hash.h
+++ b/src/include/access/hash.h
@@ -103,6 +103,53 @@ typedef struct HashScanPosItem /* what we remember about each match */
OffsetNumber indexOffset; /* index item's location within page */
} HashScanPosItem;
+typedef struct HashScanPosData
+{
+ Buffer buf; /* if valid, the buffer is pinned */
+ XLogRecPtr lsn; /* pos in the WAL stream when page was read */
+ BlockNumber currPage; /* current hash index page */
+ BlockNumber nextPage; /* next overflow page */
+ BlockNumber prevPage; /* prev overflow or bucket page */
+
+ /*
+ * The items array is always ordered in index order (ie, increasing
+ * indexoffset). When scanning backwards it is convenient to fill the
+ * array back-to-front, so we start at the last slot and fill downwards.
+ * Hence we need both a first-valid-entry and a last-valid-entry counter.
+ * itemIndex is a cursor showing which entry was last returned to caller.
+ */
+ int firstItem; /* first valid index in items[] */
+ int lastItem; /* last valid index in items[] */
+ int itemIndex; /* current index in items[] */
+
+ HashScanPosItem items[MaxIndexTuplesPerPage]; /* MUST BE LAST */
+} HashScanPosData;
+
+#define HashScanPosIsPinned(scanpos) \
+( \
+ AssertMacro(BlockNumberIsValid((scanpos).currPage) || \
+ !BufferIsValid((scanpos).buf)), \
+ BufferIsValid((scanpos).buf) \
+)
+
+#define HashScanPosIsValid(scanpos) \
+( \
+ AssertMacro(BlockNumberIsValid((scanpos).currPage) || \
+ !BufferIsValid((scanpos).buf)), \
+ BlockNumberIsValid((scanpos).currPage) \
+)
+
+#define HashScanPosInvalidate(scanpos) \
+ do { \
+ (scanpos).buf = InvalidBuffer; \
+ (scanpos).lsn = InvalidXLogRecPtr; \
+ (scanpos).currPage = InvalidBlockNumber; \
+ (scanpos).nextPage = InvalidBlockNumber; \
+ (scanpos).prevPage = InvalidBlockNumber; \
+ (scanpos).firstItem = 0; \
+ (scanpos).lastItem = 0; \
+ (scanpos).itemIndex = 0; \
+ } while (0);
/*
* HashScanOpaqueData is private state for a hash index scan.
@@ -145,8 +192,14 @@ typedef struct HashScanOpaqueData
*/
bool hashso_buc_split;
/* info about killed items if any (killedItems is NULL if never used) */
- HashScanPosItem *killedItems; /* tids and offset numbers of killed items */
+ int *killedItems; /* currPos.items indexes of killed items */
int numKilled; /* number of currently stored items */
+
+ /*
+ * Identify all the matching items on a page and save them in
+ * HashScanPosData
+ */
+ HashScanPosData currPos; /* current position data */
} HashScanOpaqueData;
typedef HashScanOpaqueData *HashScanOpaque;
--
1.8.3.1
From ef4180ffcaea44054d5b4894240be804c3970c6d Mon Sep 17 00:00:00 2001
From: ashu <[email protected]>
Date: Mon, 7 Aug 2017 16:22:19 +0530
Subject: [PATCH] Remove redundant hash function _hash_step and do some code
cleanup.
Remove redundant function _hash_step() and some of the unused members
of HashScanOpaqueData. The function _hash_step() used to find the next
qualifing tuple in the index page is no more required as new hash index
works page at a time which means it reads all the qualifing tuples in a
page at once with the help of _hash_readpage().
Patch by Ashutosh Sharma <[email protected]>
---
src/backend/access/hash/hashsearch.c | 206 -----------------------------------
src/include/access/hash.h | 15 ---
2 files changed, 221 deletions(-)
diff --git a/src/backend/access/hash/hashsearch.c b/src/backend/access/hash/hashsearch.c
index f4408ab..58eb108 100644
--- a/src/backend/access/hash/hashsearch.c
+++ b/src/backend/access/hash/hashsearch.c
@@ -431,212 +431,6 @@ _hash_first(IndexScanDesc scan, ScanDirection dir)
}
/*
- * _hash_step() -- step to the next valid item in a scan in the bucket.
- *
- * If no valid record exists in the requested direction, return
- * false. Else, return true and set the hashso_curpos for the
- * scan to the right thing.
- *
- * Here we need to ensure that if the scan has started during split, then
- * skip the tuples that are moved by split while scanning bucket being
- * populated and then scan the bucket being split to cover all such
- * tuples. This is done to ensure that we don't miss tuples in the scans
- * that are started during split.
- *
- * 'bufP' points to the current buffer, which is pinned and read-locked.
- * On success exit, we have pin and read-lock on whichever page
- * contains the right item; on failure, we have released all buffers.
- */
-bool
-_hash_step(IndexScanDesc scan, Buffer *bufP, ScanDirection dir)
-{
- Relation rel = scan->indexRelation;
- HashScanOpaque so = (HashScanOpaque) scan->opaque;
- ItemPointer current;
- Buffer buf;
- Page page;
- HashPageOpaque opaque;
- OffsetNumber maxoff;
- OffsetNumber offnum;
- BlockNumber blkno;
- IndexTuple itup;
-
- current = &(so->hashso_curpos);
-
- buf = *bufP;
- _hash_checkpage(rel, buf, LH_BUCKET_PAGE | LH_OVERFLOW_PAGE);
- page = BufferGetPage(buf);
- opaque = (HashPageOpaque) PageGetSpecialPointer(page);
-
- /*
- * If _hash_step is called from _hash_first, current will not be valid, so
- * we can't dereference it. However, in that case, we presumably want to
- * start at the beginning/end of the page...
- */
- maxoff = PageGetMaxOffsetNumber(page);
- if (ItemPointerIsValid(current))
- offnum = ItemPointerGetOffsetNumber(current);
- else
- offnum = InvalidOffsetNumber;
-
- /*
- * 'offnum' now points to the last tuple we examined (if any).
- *
- * continue to step through tuples until: 1) we get to the end of the
- * bucket chain or 2) we find a valid tuple.
- */
- do
- {
- switch (dir)
- {
- case ForwardScanDirection:
- if (offnum != InvalidOffsetNumber)
- offnum = OffsetNumberNext(offnum); /* move forward */
- else
- {
- /* new page, locate starting position by binary search */
- offnum = _hash_binsearch(page, so->hashso_sk_hash);
- }
-
- for (;;)
- {
- /*
- * check if we're still in the range of items with the
- * target hash key
- */
- if (offnum <= maxoff)
- {
- Assert(offnum >= FirstOffsetNumber);
- itup = (IndexTuple) PageGetItem(page, PageGetItemId(page, offnum));
-
- /*
- * skip the tuples that are moved by split operation
- * for the scan that has started when split was in
- * progress
- */
- if (so->hashso_buc_populated && !so->hashso_buc_split &&
- (itup->t_info & INDEX_MOVED_BY_SPLIT_MASK))
- {
- offnum = OffsetNumberNext(offnum); /* move forward */
- continue;
- }
-
- if (so->hashso_sk_hash == _hash_get_indextuple_hashkey(itup))
- break; /* yes, so exit for-loop */
- }
-
- /* Before leaving current page, deal with any killed items */
- if (so->numKilled > 0)
- _hash_kill_items(scan);
-
- /*
- * ran off the end of this page, try the next
- */
- _hash_readnext(scan, &buf, &page, &opaque);
- if (BufferIsValid(buf))
- {
- maxoff = PageGetMaxOffsetNumber(page);
- offnum = _hash_binsearch(page, so->hashso_sk_hash);
- }
- else
- {
- itup = NULL;
- break; /* exit for-loop */
- }
- }
- break;
-
- case BackwardScanDirection:
- if (offnum != InvalidOffsetNumber)
- offnum = OffsetNumberPrev(offnum); /* move back */
- else
- {
- /* new page, locate starting position by binary search */
- offnum = _hash_binsearch_last(page, so->hashso_sk_hash);
- }
-
- for (;;)
- {
- /*
- * check if we're still in the range of items with the
- * target hash key
- */
- if (offnum >= FirstOffsetNumber)
- {
- Assert(offnum <= maxoff);
- itup = (IndexTuple) PageGetItem(page, PageGetItemId(page, offnum));
-
- /*
- * skip the tuples that are moved by split operation
- * for the scan that has started when split was in
- * progress
- */
- if (so->hashso_buc_populated && !so->hashso_buc_split &&
- (itup->t_info & INDEX_MOVED_BY_SPLIT_MASK))
- {
- offnum = OffsetNumberPrev(offnum); /* move back */
- continue;
- }
-
- if (so->hashso_sk_hash == _hash_get_indextuple_hashkey(itup))
- break; /* yes, so exit for-loop */
- }
-
- /* Before leaving current page, deal with any killed items */
- if (so->numKilled > 0)
- _hash_kill_items(scan);
-
- /*
- * ran off the end of this page, try the next
- */
- _hash_readprev(scan, &buf, &page, &opaque);
- if (BufferIsValid(buf))
- {
- TestForOldSnapshot(scan->xs_snapshot, rel, page);
- maxoff = PageGetMaxOffsetNumber(page);
- offnum = _hash_binsearch_last(page, so->hashso_sk_hash);
- }
- else
- {
- itup = NULL;
- break; /* exit for-loop */
- }
- }
- break;
-
- default:
- /* NoMovementScanDirection */
- /* this should not be reached */
- itup = NULL;
- break;
- }
-
- if (itup == NULL)
- {
- /*
- * We ran off the end of the bucket without finding a match.
- * Release the pin on bucket buffers. Normally, such pins are
- * released at end of scan, however scrolling cursors can
- * reacquire the bucket lock and pin in the same scan multiple
- * times.
- */
- *bufP = so->hashso_curbuf = InvalidBuffer;
- ItemPointerSetInvalid(current);
- _hash_dropscanbuf(rel, so);
- return false;
- }
-
- /* check the tuple quals, loop around if not met */
- } while (!_hash_checkqual(scan, itup));
-
- /* if we made it to here, we've found a valid tuple */
- blkno = BufferGetBlockNumber(buf);
- *bufP = so->hashso_curbuf = buf;
- ItemPointerSet(current, blkno, offnum);
- return true;
-}
-
-/*
* _hash_readpage() -- Load data from current index page into so->currPos
*
* We scan all the items in the current index page and save them into
diff --git a/src/include/access/hash.h b/src/include/access/hash.h
index 3e90b89..19fb147 100644
--- a/src/include/access/hash.h
+++ b/src/include/access/hash.h
@@ -159,14 +159,6 @@ typedef struct HashScanOpaqueData
/* Hash value of the scan key, ie, the hash key we seek */
uint32 hashso_sk_hash;
- /*
- * We also want to remember which buffer we're currently examining in the
- * scan. We keep the buffer pinned (but not locked) across hashgettuple
- * calls, in order to avoid doing a ReadBuffer() for every tuple in the
- * index.
- */
- Buffer hashso_curbuf;
-
/* remember the buffer associated with primary bucket */
Buffer hashso_bucket_buf;
@@ -177,12 +169,6 @@ typedef struct HashScanOpaqueData
*/
Buffer hashso_split_bucket_buf;
- /* Current position of the scan, as an index TID */
- ItemPointerData hashso_curpos;
-
- /* Current position of the scan, as a heap TID */
- ItemPointerData hashso_heappos;
-
/* Whether scan starts on bucket being populated due to split */
bool hashso_buc_populated;
@@ -432,7 +418,6 @@ extern void _hash_finish_split(Relation rel, Buffer metabuf, Buffer obuf,
/* hashsearch.c */
extern bool _hash_next(IndexScanDesc scan, ScanDirection dir);
extern bool _hash_first(IndexScanDesc scan, ScanDirection dir);
-extern bool _hash_step(IndexScanDesc scan, Buffer *bufP, ScanDirection dir);
/* hashsort.c */
typedef struct HSpool HSpool; /* opaque struct in hashsort.c */
--
1.8.3.1
From e09ccbce2aa3388db37ec72e3c02e9593bbed4f9 Mon Sep 17 00:00:00 2001
From: ashu <[email protected]>
Date: Sun, 30 Jul 2017 12:37:24 +0530
Subject: [PATCH] Improve locking startegy during VACUUM in Hash Index v4
Patch by Ashutosh Sharma <[email protected]>
---
src/backend/access/hash/README | 2 +-
src/backend/access/hash/hash.c | 21 ++++++++++-----------
src/backend/access/hash/hashovfl.c | 4 +---
3 files changed, 12 insertions(+), 15 deletions(-)
diff --git a/src/backend/access/hash/README b/src/backend/access/hash/README
index eef7d66..34a84ce 100644
--- a/src/backend/access/hash/README
+++ b/src/backend/access/hash/README
@@ -396,8 +396,8 @@ The fourth operation is garbage collection (bulk deletion):
mark the target page dirty
write WAL for deleting tuples from target page
if this is the last bucket page, break out of loop
- pin and x-lock next page
release prior lock and pin (except keep pin on primary bucket page)
+ pin and x-lock next page
if the page we have locked is not the primary bucket page:
release lock and take exclusive lock on primary bucket page
if there are no other pins on the primary bucket page:
diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c
index 2b858f0..3d68af5 100644
--- a/src/backend/access/hash/hash.c
+++ b/src/backend/access/hash/hash.c
@@ -663,11 +663,9 @@ hashvacuumcleanup(IndexVacuumInfo *info, IndexBulkDeleteResult *stats)
* that the next valid TID will be greater than or equal to the current
* valid TID. There can't be any concurrent scans in progress when we first
* enter this function because of the cleanup lock we hold on the primary
- * bucket page, but as soon as we release that lock, there might be. We
- * handle that by conspiring to prevent those scans from passing our cleanup
- * scan. To do that, we lock the next page in the bucket chain before
- * releasing the lock on the previous page. (This type of lock chaining is
- * not ideal, so we might want to look for a better solution at some point.)
+ * bucket page, but as soon as we release that lock, there might be. But,
+ * we do not have to bother about it, as the hash index scan work in page
+ * at a time mode.
*
* We need to retain a pin on the primary bucket to ensure that no concurrent
* split can start.
@@ -836,19 +834,20 @@ hashbucketcleanup(Relation rel, Bucket cur_bucket, Buffer bucket_buf,
if (!BlockNumberIsValid(blkno))
break;
- next_buf = _hash_getbuf_with_strategy(rel, blkno, HASH_WRITE,
- LH_OVERFLOW_PAGE,
- bstrategy);
-
/*
- * release the lock on previous page after acquiring the lock on next
- * page
+ * As the hash index scan work in page at a time mode, vacuum can
+ * release the lock on previous page before acquiring lock on the next
+ * page.
*/
if (retain_pin)
LockBuffer(buf, BUFFER_LOCK_UNLOCK);
else
_hash_relbuf(rel, buf);
+ next_buf = _hash_getbuf_with_strategy(rel, blkno, HASH_WRITE,
+ LH_OVERFLOW_PAGE,
+ bstrategy);
+
buf = next_buf;
}
diff --git a/src/backend/access/hash/hashovfl.c b/src/backend/access/hash/hashovfl.c
index c206e70..3a7011d 100644
--- a/src/backend/access/hash/hashovfl.c
+++ b/src/backend/access/hash/hashovfl.c
@@ -790,9 +790,7 @@ _hash_initbitmapbuffer(Buffer buf, uint16 bmsize, bool initpage)
* Caller must acquire cleanup lock on the primary page of the target
* bucket to exclude any scans that are in progress, which could easily
* be confused into returning the same tuple more than once or some tuples
- * not at all by the rearrangement we are performing here. To prevent
- * any concurrent scan to cross the squeeze scan we use lock chaining
- * similar to hasbucketcleanup. Refer comments atop hashbucketcleanup.
+ * not at all by the rearrangement we are performing here.
*
* We need to retain a pin on the primary bucket to ensure that no concurrent
* split can start.
--
1.8.3.1
--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers