st 24. 4. 2024 v 14:50 odesílatel Bharath Rupireddy < bharath.rupireddyforpostg...@gmail.com> napsal:
> On Wed, Apr 3, 2024 at 1:10 AM Jeff Davis <pg...@j-davis.com> wrote: > > > > Here's where I think this API should go: > > > > 1. Have table_modify_begin/end and table_modify_buffer_insert, like > > those that are implemented in your patch. > > I added table_modify_begin, table_modify_buffer_insert, > table_modify_buffer_flush and table_modify_end. Table Access Method (AM) > authors now can define their own buffering strategy and flushing decisions > based on their tuple storage kinds and various other AM specific factors. I > also added a default implementation that falls back to single inserts when > no implementation is provided for these AM by AM authors. See the attached > v19-0001 patch. > > > 2. Add some kind of flush callback that will be called either while the > > tuples are being flushed or after the tuples are flushed (but before > > they are freed by the AM). (Aside: do we need to call it while the > > tuples are being flushed to get the right visibility semantics for > > after-row triggers?) > > I added a flush callback named TableModifyBufferFlushCallback; when > provided by callers invoked after tuples are flushed to disk from the > buffers but before the AM frees them up. Index insertions and AFTER ROW > INSERT triggers can be executed in this callback. See the v19-0001 patch > for how AM invokes the flush callback, and see either v19-0003 or v19-0004 > or v19-0005 for how a caller can supply the callback and required context > to execute index insertions and AR triggers. > > > 3. Add table_modify_buffer_{update|delete} APIs. > > > > 9. Use these new methods for DELETE, UPDATE, and MERGE. MERGE can use > > the buffer_insert/update/delete APIs; we don't need a separate merge > > method. This probably requires that the AM maintain 3 separate buffers > > to distinguish different kinds of changes at flush time (obviously > > these can be initialized lazily to avoid overhead when not being used). > > I haven't thought about these things yet. I can only focus on them after > seeing how the attached patches go from here. > > > 4. Some kind of API tweaks to help manage memory when modifying > > pertitioned tables, so that the buffering doesn't get out of control. > > Perhaps just reporting memory usage and allowing the caller to force > > flushes would be enough. > > Heap implementation for thes new Table AMs uses a separate memory context > for all of the operations. Please have a look and let me know if we need > anything more. > > > 5. Use these new methods for CREATE/REFRESH MATERIALIZED VIEW. This is > > fairly straightforward, I believe, and handled by your patch. Indexes > > are (re)built afterward, and no triggers are possible. > > > > 6. Use these new methods for CREATE TABLE ... AS. This is fairly > > straightforward, I believe, and handled by your patch. No indexes or > > triggers are possible. > > I used multi inserts for all of these including TABLE REWRITE commands > such as ALTER TABLE. See the attached v19-0002 patch. Check the testing > section below for benefits. > > FWIW, following are some of the TABLE REWRITE commands that can get > benefitted: > > ALTER TABLE tbl ALTER c1 TYPE bigint; > ALTER TABLE itest13 ADD COLUMN c int GENERATED BY DEFAULT AS IDENTITY; > ALTER MATERIALIZED VIEW heapmv SET ACCESS METHOD heap2; > ALTER TABLE itest3 ALTER COLUMN a TYPE int; > ALTER TABLE gtest20 ALTER COLUMN b SET EXPRESSION AS (a * 3); > ALTER TABLE has_volatile ADD col4 int DEFAULT (random() * 10000)::int; > and so on. > > > 7. Use these new methods for COPY. We have to be careful to avoid > > regressions for the heap method, because it's already managing its own > > buffers. If the AM manages the buffering, then it may require > > additional copying of slots, which could be a disadvantage. To solve > > this, we may need some minor API tweaks to avoid copying when the > > caller guarantees that the memory will not be freed to early, or > > perhaps expose the AM's memory context to copyfrom.c. Another thing to > > consider is that the buffering in copyfrom.c is also used for FDWs, so > > that buffering code path needs to be preserved in copyfrom.c even if > > not used for AMs. > > I modified the COPY FROM code to use the new Table AMs, and performed some > tests which show no signs of regression. Check the testing section below > for more details. See the attached v19-0005 patch. With this, > table_multi_insert can be deprecated. > > > 8. Use these new methods for INSERT INTO ... SELECT. One potential > > challenge here is that execution nodes are not always run to > > completion, so we need to be sure that the flush isn't forgotten in > > that case. > > I did that in v19-0003. I did place the table_modify_end call in multiple > places including ExecEndModifyTable. I didn't find any issues with it. > Please have a look and let me know if we need the end call in more places. > Check the testing section below for benefits. > > > 10. Use these new methods for logical apply. > > I used multi inserts for Logical Replication apply. in v19-0004. Check the > testing section below for benefits. > > FWIW, open-source pglogical does have multi insert support, check code > around > https://github.com/2ndQuadrant/pglogical/blob/REL2_x_STABLE/pglogical_apply_heap.c#L960 > . > > > 11. Deprecate the multi_insert API. > > I did remove both table_multi_insert and table_finish_bulk_insert in > v19-0006. Perhaps, removing them isn't a great idea, but adding a > deprecation WARNING/ERROR until some more PG releases might be worth > looking at. > > > Thoughts on this plan? Does your patch make sense in v17 as a stepping > > stone, or should we try to make all of these API changes together in > > v18? > > If the design, code and benefits that these new Table AMs bring to the > table look good, I hope to see it for PG 18. > > > Also, a sample AM code would be a huge benefit here. Writing a real AM > > is hard, but perhaps we can at least have an example one to demonstrate > > how to use these APIs? > > The attached patches already have implemented these new Table AMs for > Heap. I don't think we need a separate implementation to demonstrate. If > others feel so, I'm open to thoughts here. > > Having said above, I'd like to reiterate the motivation behind the new > Table AMs for multi and single inserts. > > 1. A scan-like API with state being carried across is thought to be better > as suggested by Andres Freund - > https://www.postgresql.org/message-id/20200924024128.kyk3r5g7dnu3f...@alap3.anarazel.de > . > 2. Allowing a Table AM to optimize operations across multiple inserts, > define its own buffering strategy and take its own flushing decisions based > on their tuple storage kinds and various other AM specific factors. > 3. Improve performance of various SQL commands with multi inserts for Heap > AM. > > The attached v19 patches might need some more detailed comments, some > documentation and some specific tests ensuring the multi inserts for Heap > are kicked-in for various commands. I'm open to thoughts here. > > I did some testing to see how various commands benefit with multi inserts > using these new Table AM for heap. It's not only the improvement in > performance these commands see, but also the amount of WAL that gets > generated reduces greatly. After all, multi inserts optimize the insertions > by writing less WAL. IOW, writing WAL record per page if multiple rows fit > into a single data page as opposed to WAL record per row. > > Test case 1: 100 million rows, 2 columns (int and float) > > Command | HEAD (sec) | PATCHED (sec) | Faster by % > | Faster by X > ------------------------------ | ---------- | ------------- | ----------- > | ----------- > CREATE TABLE AS | 121 | 77 | 36.3 > | 1.57 > CREATE MATERIALIZED VIEW | 101 | 49 | 51.4 > | 2.06 > REFRESH MATERIALIZED VIEW | 113 | 54 | 52.2 > | 2.09 > ALTER TABLE (TABLE REWRITE) | 124 | 81 | 34.6 > | 1.53 > COPY FROM | 71 | 72 | 0 > | 1 > INSERT INTO ... SELECT | 117 | 62 | 47 > | 1.88 > LOGICAL REPLICATION APPLY | 393 | 306 | 22.1 > | 1.28 > > Command | HEAD (WAL in GB) | PATCHED (WAL in GB) | > Reduced by % | Reduced by X > ------------------------------ | ---------------- | ------------------- | > ------------ | ----------- > CREATE TABLE AS | 6.8 | 2.4 | > 64.7 | 2.83 > CREATE MATERIALIZED VIEW | 7.2 | 2.3 | > 68 | 3.13 > REFRESH MATERIALIZED VIEW | 10 | 5.1 | > 49 | 1.96 > ALTER TABLE (TABLE REWRITE) | 8 | 3.2 | > 60 | 2.5 > COPY FROM | 2.9 | 3 | > 0 | 1 > INSERT INTO ... SELECT | 8 | 3 | > 62.5 | 2.66 > LOGICAL REPLICATION APPLY | 7.5 | 2.3 | > 69.3 | 3.26 > > Test case 2: 1 billion rows, 1 column (int) > > Command | HEAD (sec) | PATCHED (sec) | Faster by % > | Faster by X > ------------------------------ | ---------- | ------------- | ----------- > | ----------- > CREATE TABLE AS | 794 | 386 | 51.38 > | 2.05 > CREATE MATERIALIZED VIEW | 1006 | 563 | 44.03 > | 1.78 > REFRESH MATERIALIZED VIEW | 977 | 603 | 38.28 > | 1.62 > ALTER TABLE (TABLE REWRITE) | 1189 | 714 | 39.94 > | 1.66 > COPY FROM | 321 | 330 | -0.02 > | 0.97 > INSERT INTO ... SELECT | 1084 | 586 | 45.94 > | 1.84 > LOGICAL REPLICATION APPLY | 3530 | 2982 | 15.52 > | 1.18 > > Command | HEAD (WAL in GB) | PATCHED (WAL in GB) | > Reduced by % | Reduced by X > ------------------------------ | ---------------- | ------------------- | > ------------ | ----------- > CREATE TABLE AS | 60 | 12 | > 80 | 5 > CREATE MATERIALIZED VIEW | 60 | 12 | > 80 | 5 > REFRESH MATERIALIZED VIEW | 60 | 12 | > 80 | 5 > ALTER TABLE (TABLE REWRITE) | 123 | 31 | > 60 | 2.5 > COPY FROM | 12 | 12 | > 0 | 1 > INSERT INTO ... SELECT | 120 | 24 | > 80 | 5 > LOGICAL REPLICATION APPLY | 61 | 12 | > 80.32 | 5 > looks pretty impressive! Pavel > > Test setup: > ./configure --prefix=$PWD/pg17/ --enable-tap-tests CFLAGS="-ggdb3 -O2" > > install.log && make -j 8 install > install.log 2>&1 & > > wal_level=logical > max_wal_size = 256GB > checkpoint_timeout = 1h > > Test system is EC2 instance of type c5.4xlarge: > Architecture: x86_64 > CPU op-mode(s): 32-bit, 64-bit > Address sizes: 46 bits physical, 48 bits virtual > Byte Order: Little Endian > CPU(s): 16 > On-line CPU(s) list: 0-15 > Vendor ID: GenuineIntel > Model name: Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz > CPU family: 6 > Model: 85 > Thread(s) per core: 2 > Core(s) per socket: 8 > Socket(s): 1 > Stepping: 7 > BogoMIPS: 5999.99 > Caches (sum of all): > L1d: 256 KiB (8 instances) > L1i: 256 KiB (8 instances) > L2: 8 MiB (8 instances) > L3: 35.8 MiB (1 instance) > NUMA: > NUMA node(s): 1 > NUMA node0 CPU(s): 0-15 > RAM: > MemTotal: 32036536 kB > > -- > Bharath Rupireddy > PostgreSQL Contributors Team > RDS Open Source Databases > Amazon Web Services: https://aws.amazon.com >