Re: [HACKERS] WIP: Access method extendability

Petr Jelinek Tue, 29 Mar 2016 09:48:14 -0700

On 29/03/16 18:25, Alvaro Herrera wrote:

+ /*-------------------------------------------------------------------------
>+  * API for construction of generic xlog records
>+  *
>+  * This API allows user to construct generic xlog records which describe
>+  * difference between pages in a generic way.  This is useful for
>+  * extensions which provide custom access methods because they can't
>+  * register their own WAL redo routines.
>+  *
>+  * Each record must be constructed by following these steps:
>+  * 1) GenericXLogStart(relation) - start construction of a generic xlog
>+  *          record for the given relation.
>+  * 2) GenericXLogRegister(buffer, isNew) - register one or more buffers
>+  *          for the record.  This function returns a copy of the page
>+  *          image where modifications can be performed.  The second argument
>+  *          indicates if the block is new (i.e. a full page image should be 
taken).
>+  * 3) Apply modification of page images obtained in the previous step.
>+  * 4) GenericXLogFinish() - finish construction of generic xlog record.
>+  *
>+  * The xlog record construction can be canceled at any step by calling
>+  * GenericXLogAbort().  All changes made to page images copies will be
>+  * discarded.
>+  *
>+  * Please, note the following points when constructing generic xlog records.
>+  * - No direct modifications of page images are allowed!  All modifications
>+  *         must be done in the copies returned by GenericXLogRegister().  In 
other
>+  *         words the code which makes generic xlog records must never call
>+  *         BufferGetPage().
>+  * - Registrations of buffers (step 2) and modifications of page images
>+  *         (step 3) can be mixed in any sequence.  The only restriction is 
that
>+  *         you can only modify page image after registration of corresponding
>+  *         buffer.
>+  * - After registration, the buffer also can be unregistered by calling
>+  *         GenericXLogUnregister(buffer).  In this case the changes made in
>+  *         that particular page image copy will be discarded.
>+  * - Generic xlog assumes that pages are using standard layout, i.e., all
>+  *         data between pd_lower and pd_upper will be discarded.
>+  * - Maximum number of buffers simultaneously registered for a generic xlog
>+  *         record is MAX_GENERIC_XLOG_PAGES.  An error will be thrown if 
this limit
>+  *         is exceeded.
>+  * - Since you modify copies of page images, GenericXLogStart() doesn't
>+  *         start a critical section.  Thus, you can do memory allocation, 
error
>+  *         throwing etc between GenericXLogStart() and GenericXLogFinish().
>+  *         The actual critical section is present inside GenericXLogFinish().
>+  * - GenericXLogFinish() takes care of marking buffers dirty and setting 
their
>+  *         LSNs.  You don't need to do this explicitly.
>+  * - For unlogged relations, everything works the same except there is no
>+  *         WAL record produced.  Thus, you typically don't need to do any 
explicit
>+  *         checks for unlogged relations.
>+  * - If registered buffer isn't new, generic xlog record contains delta
>+  *         between old and new page images.  This delta is produced by per 
byte
>+  *         comparison.  This current delta mechanism is not effective for 
data shifts
>+  *         inside the page and may be improved in the future.
>+  * - Generic xlog redo function will acquire exclusive locks on buffers
>+  *         in the same order they were registered.  After redo of all 
changes,
>+  *         the locks will be released in the same order.
>+  *
>+  *
>+  * Internally, delta between pages consists of set of fragments.  Each
>+  * fragment represents changes made in given region of page.  A fragment is
>+  * described as follows:
>+  *
>+  * - offset of page region (OffsetNumber)
>+  * - length of page region (OffsetNumber)
>+  * - data - the data to place into described region ('length' number of 
bytes)
>+  *
>+  * Unchanged regions of page are not represented in the delta.  As a result,
>+  * the delta can be more compact than full page image.  But if the unchanged 
region
>+  * of the page is less than fragment header (offset and length) the delta
>+  * would be bigger than the full page image. For this reason we break into 
fragments
>+  * only if the unchanged region is bigger than MATCH_THRESHOLD.
>+  *
>+  * The worst case for delta size is when we didn't find any unchanged region
>+  * in the page. Then size of delta would be size of page plus size of 
fragment
>+  * header.
>+  */
>+ #define FRAGMENT_HEADER_SIZE      (2 * sizeof(OffsetNumber))
>+ #define MATCH_THRESHOLD                   FRAGMENT_HEADER_SIZE
>+ #define MAX_DELTA_SIZE                    BLCKSZ + FRAGMENT_HEADER_SIZE

I incorporated your changes and did some additional refinements on topof them still.

Attached is delta against v12, that should cause less issues whenmerging for Teodor.


--
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

diff --git a/src/backend/access/transam/generic_xlog.c 
b/src/backend/access/transam/generic_xlog.c
index 7ca03bf..eab40a2 100644
--- a/src/backend/access/transam/generic_xlog.c
+++ b/src/backend/access/transam/generic_xlog.c
@@ -19,78 +19,77 @@
 #include "utils/memutils.h"
 
 /*-------------------------------------------------------------------------
- * API for construction of generic xlog records
+ * API for construction of generic xlThis is useful forog records
  *
- * This API allows user to construct generic xlog records which are
- * describing difference between pages in general way.  Thus it's useful
- * for extension which provides custom access methods because they couldn't
- * register their own WAL redo routines.
+ * This API allows user to construct generic xlog records which describe
+ * difference between pages in a generic way.  This is useful for extensions
+ * which provide custom access methods because they can't register their own
+ * WAL redo routines.
  *
- * Generic xlog record should be constructed in following steps.
- * 1) GenericXLogStart(relation) - start construction of generic xlog
- *       record for given relation.
+ * Each record must be constructed by following these steps:
+ * 1) GenericXLogStart(relation) - start construction of a generic xlog
+ *       record for the given relation.
  * 2) GenericXLogRegister(buffer, isNew) - register one or more buffers
- *       for generic xlog record.  This function return a copy of page image
- *       where modifications should be performed.  The second argument
- *       indicates that block is new and full image should be taken.
- * 3) Do modification of page images obtained in previous step.
+ *       for generic xlog record.  This function returns a copy of the page 
image
+ *       where modifications can be performed.  The second argument indicates
+ *       if block is new (i.e. a full page image should be taken).
+ * 3) Apply modification of page images obtained in the previous step.
  * 4) GenericXLogFinish() - finish construction of generic xlog record.
  *
- * Please, note following points while constructing generic xlog records.
+ * The xlog record construction can be canceled at any step by calling
+ * GenericXLogAbort().  All changes made to page images copies will be
+ * discarded.
+ *
+ * Please note following points when constructing generic xlog records.
  * - No direct modifications of page images are allowed! All modifications
- *      should be done in copies returned by GenericXLogRegister().  Literally
- *      code which makes generic xlog records should never call
- *      BufferGetPage() function.
- * - On any step generic xlog record construction could be canceled by
- *      calling GenericXLogAbort().  All changes made in page images copies
- *      would be discarded.
+ *      must be done in copies returned by GenericXLogRegister().  In other 
words
+ *      code which makes generic xlog records must never call BufferGetPage().
  * - Registrations of buffers (step 2) and modifications of page images
- *      (step 3) could be mixed in any sequence.  The only restriction is that
- *      you can modify page image only after registration of corresponding
+ *      (step 3) can be mixed in any sequence.  The only restriction is that
+ *      you can only modify page image after registration of corresponding
  *      buffer.
- * - After registration buffer also can be unregistered by calling
- *      GenericXLogUnregister(buffer).  In this case changes made in particular
- *      page image copy will be discarded.
+ * - After registration, the buffer can also be unregistered by calling
+ *      GenericXLogUnregister(buffer).  In this case the changes made in
+ *      that particular page image copy will be discarded.
  * - Generic xlog assumes that pages are using standard layout.  I.e. all
  *      information between pd_lower and pd_upper will be discarded.
- * - Maximum number of buffers simultaneously registered for generic xlog
- *      is MAX_GENERIC_XLOG_PAGES.  Error would be thrown if this limit
+ * - Maximum number of buffers simultaneously registered for a generic xlog
+ *      is MAX_GENERIC_XLOG_PAGES.  Error will be thrown if this limit is
  *      exceeded.
  * - Since you modify copies of page images, GenericXLogStart() doesn't
  *      start a critical section.  Thus, you can do memory allocation, error
  *      throwing etc between GenericXLogStart() and GenericXLogFinish().
- *      Actual critical section present inside GenericXLogFinish().
- * - GenericXLogFinish() takes care about marking buffers dirty and setting
+ *      The actual critical section is present inside GenericXLogFinish().
+ * - GenericXLogFinish() takes care of marking buffers dirty and setting
  *      their LSNs.  You don't need to do this explicitly.
- * - For unlogged relations, everything work the same expect there is no
+ * - For unlogged relations, everything works the same except there is no
  *      WAL record produced.  Thus, you typically don't need to do any explicit
  *      checks for unlogged relations.
  * - If registered buffer isn't new, generic xlog record contains delta
- *      between old and new page images.  This delta is produced by per byte
- *      comparison.  Current delta mechanist is not effective for data shift
- *      inside the page.  However, it could be improved in further versions.
+ *      between old and new page images.  This delta is produced using per byte
+ *      comparison.  The current delta mechanist is not effective for data 
shifts
+ *      inside the page and may be improved in the future.
  * - Generic xlog redo function will acquire exclusive locks to buffers
- *      in the same order they were registered.  After redo of all changes
- *      locks would be released in the same order.  That could makes sense for
- *      concurrency.
+ *      in the same order as they were registered.  After redo of all changes,
+ *      locks will be released in the same order.
  *
- * Internally delta between pages consists of set of fragments.  Each fragment
- * represents changes made in given region of page.  Fragment is described
- * as following.
+ * Internally, delta between pages consists of set of fragments.  Each
+ * fragment represents changes made in a given region of a page.  A fragment
+ * is described as following.
  *
  * - offset of page region (OffsetNumber)
  * - length of page region (OffsetNumber)
  * - data - the data to place into described region ('length' number of bytes)
  *
- * Unchanged regions of page are uncovered by these fragments.  This is why
- * delta could be more compact than full page image.  But if unchanged region
- * of page is less than fragment header (offset and length) then it would
- * increase size of delta instead of decreasing.  Thus, we break fragment only
- * for unchanged regions greater than MATCH_THRESHOLD.
+ * Unchanged regions of page are not represented in the delta.  As a result
+ * delta can be more compact than the full page image.  But if the unchanged
+ * region of the page is smaller than the fragment header (offset and length)
+ * the delta would be bigger than the full page image. For this reason we
+ * break fragment only if the unchanged region is bigger than MATCH_THRESHOLD.
  *
  * The worst case for delta size is when we didn't find any unchanged region
- * in the page. Then size of delta would be size of page plus size of fragment
- * header.
+ * in the page. The size of delta will be size of page plus size of fragment
+ * header in that case.
  */
 #define FRAGMENT_HEADER_SIZE   (2 * sizeof(OffsetNumber))
 #define MATCH_THRESHOLD                        FRAGMENT_HEADER_SIZE
@@ -168,8 +167,8 @@ writeDelta(PageData *pageData)
                bool    match;
 
                /*
-                * Check if bytes in old and new page images matches.  We don't 
rely
-                * data in unallocated area between pd_lower and pd_upper.  
Thus we
+                * Check if bytes in old and new page images matches.  We don't 
care
+                * about data in unallocated area between pd_lower and 
pd_upper.  We
                 * assume unallocated area to expand with unmatched bytes.  
Bytes
                 * inside unallocated area are assumed to always match.
                 */

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Access method extendability

Reply via email to