t consistently, but often enough for it to seem like a real
problem.
--
Peter Geoghegan
, but that doesn't seem great to me either.
Simply because this wording also doesn't reference IN() conditions in
index quals.
--
Peter Geoghegan
my own Postgres 17 B-Tree work internally using some kind of
headline grabbing measure like "6x faster". That just seems silly to
me. I can contrive a case where it's faster by an arbitrarily large
amount. Much like how a selective index scan can be arbitrarily faster
than a sequential scan. Again, a qualitative improvement.
--
Peter Geoghegan
dy quite unnatural.
--
Peter Geoghegan
I didn't
examine it much closer than that, though.
--
Peter Geoghegan
debatable -- it's
always better to be more thorough, when practical. But it's certainly
something that needs to be assessed based on the merits.
--
Peter Geoghegan
On Fri, May 17, 2024 at 1:18 PM Pavel Stehule wrote:
> pá 17. 5. 2024 v 18:02 odesílatel Peter Geoghegan napsal:
>> You've shown an inconsistency between the primary and standby with
>> respect to the heap tuple infomask bits related to freezing. It looks
>> like a F
int bits mask the problem quite well,
even on 16).
--
Peter Geoghegan
On Fri, May 17, 2024 at 9:09 AM Peter Eisentraut wrote:
> How would the development process look like if we
> just had one commitfest per dev cycle that runs from July 1st to March 31st?
Exactly the same?
--
Peter Geoghegan
is true that most individual
users won't care about (say) TIDStore. But it's probably also true
that most individual users don't care about the release notes, or at
most skim the major items.
--
Peter Geoghegan
bly not very much at all. I
assume that that's what you're actually interested in doing here (you
didn't say). If it is, you'll need to update the function's contract
-- just removing the assertion isn't enough.
--
Peter Geoghegan
On Sat, May 11, 2024 at 4:12 PM Peter Geoghegan wrote:
> The dependency is fairly simple. In the presence of multiple arrays on
> the same column, which must be contradictory/redundant, but cannot be
> simplified solely due to lack of suitable cross-type support, we have
> mul
on't go through this skip array preprocessing
path).
Obviously this is rather up in the air right now. As I said, I think
that we could directly fix this case quite easily, if we had to. And
I'm sympathetic; this is pretty horrible if you happen to run into it.
--
Peter Geoghegan
On Sat, May 11, 2024 at 4:12 PM Peter Geoghegan wrote:
> Row comparisons are kind of a special case, both during preprocessing
> and during the scan itself. I find it natural to blame this problem on
> the fact that preprocessing makes exactly zero effort to detect
> contradictor
patch already deals with this in passing. So
you could say that I was already working on this. But I'm not sure
that I would actually say so myself; what I'm doing is tied to far
more complicated work.
I haven't attempted to write the kind of targeted fix that you're
thinking of. It might still be worth writing such a fix now. I
certainly have no objections.
--
Peter Geoghegan
t only comes up in scenarios involving
incomplete opfamilies, which is quite niche (obviously it's not a
factor in your test case, for example). But even if you assume that
incomplete opfamilies are common, it still doesn't seem like this
detail matters.
--
Peter Geoghegan
o be annoying at
best, and confusing at worst. I speak as somebody that uses
disable_cost a lot.
I certainly wouldn't ask anybody to make it a priority for that reason
alone -- it's not *that* bad. I've given my opinion on this because
it's already under discussion.
--
Peter Geoghegan
ix for this
> doesn't get back-patched, it would be nice to see it in a major
> version.
I find that very easy to believe.
--
Peter Geoghegan
On Fri, Apr 26, 2024 at 7:54 AM Jonathan S. Katz wrote:
> The Core Team would like to extend our congratulations to Melanie
> Plageman and Richard Guo, who have accepted invitations to become our
> newest PostgreSQL committers.
Congratulations to both!
--
Peter Geoghegan
On Mon, Apr 22, 2024 at 11:13 AM Peter Geoghegan wrote:
> I'm pretty sure that I could fix this by simply removing the
> assertion. But I need to think about it a bit more before I push a
> fix.
>
> The test case you've provided proves that _bt_preprocess_keys's
> new no-op pa
keys call
per btrescan (independent of the presence of array keys) still seems
sound.
--
Peter Geoghegan
is wrong. It is testing behavior that's much older than
commit 5bf748b86b, though. We can just get rid of it, since all of the
information that we'll actually apply when preprocessing scan keys
comes from the operator class.
Pushed a fix removing the assertion just now. Thanks for the report.
--
Peter Geoghegan
The problem is that _bt_fix_scankey_strategy shouldn't have been doing
anything with already-eliminated array scan keys in the first place
(whether or not they're on a DESC index column). I just pushed a fix
along those lines.
Thanks for the report!
--
Peter Geoghegan
n failure within _bt_first -- the
_bt_verify_arrays_bt_first assertion would catch the violation of the
invariant (the_bt_advance_array_keys postcondition invariant/_bt_first
precondition invariant). So we kinda do have some test coverage for
this function already.
--
Peter Geoghegan
On Sun, Apr 14, 2024 at 9:01 PM David Rowley wrote:
> On Mon, 15 Apr 2024 at 11:47, Peter Geoghegan wrote:
> > Technically we don't promise that WAL records won't change in minor
> > versions. In fact, the docs specifically state that the format of any
> > WAL record might c
l that uses its queryid values to accumulate query costs.
While external tools can't understand the provenance of old queryid
values, pg_stat_statements can't either.
--
Peter Geoghegan
version
upgrade), then pg_stat_statements throws away the accumulated query
stats without being asked to.
--
Peter Geoghegan
e
consideration here. Say a week or two, to work through some of the
more complicated issues -- and to take a breather. I just don't see
any upside to rushing through this process, given where we are now.
--
Peter Geoghegan
urse it also
wouldn't be a bad idea to have a BF animal for that, especially
because we already have BF animals that test things far more niche
than this.
--
Peter Geoghegan
that the planner should always prefer index quals over expression
evaluation, on general principle, even when there's no reason to think
it'll work out. At worst the executor has essentially the same
physical access patterns as the expression evaluation case. On the
other hand, providing nbtree with that context might end up being a
great deal faster.
--
Peter Geoghegan
ortance of these sorts of cases. Most of these
cases will only have 2 or 3 constants, just because that's what's most
common in general.
--
Peter Geoghegan
On Sun, Apr 7, 2024 at 9:57 PM Peter Geoghegan wrote:
> On Sun, Apr 7, 2024 at 9:50 PM Tom Lane wrote:
> > those first two Asserts are redundant with the "if" as well.
>
> I'll get rid of those other two assertions as well, then.
Done that way.
--
Peter Geoghegan
>scan_key);
> Assert(!(cur->sk_flags & SK_SEARCHARRAY));
> }
>
> those first two Asserts are redundant with the "if" as well.
I'll get rid of those other two assertions as well, then.
--
Peter Geoghegan
ss_array_keys
* failed to eliminate redundant arrays through array merging.
* _bt_compare_scankey_args just returns false when it sees
* this; it won't even try to examine either array.
*/
Do you think it needs more work?
--
Peter Geoghegan
icious. It's possible that both of those scan keys actually did
have arrays, but _bt_compare_scankey_args just treats that as a case
of being unable to prove which scan key was redundant/contradictory
due to a lack of suitable cross-type support -- so the assertion won't
be reached.
Would Coverity stop complaining if I just removed the assertion? I
could just do that, I suppose, but that seems backwards to me.
--
Peter Geoghegan
rays left at this point, so "so->numArrayKeys == 0".
FWIW I missed this because the tests only cover cases with one SOAP
inequality, which will always return early from _bt_preprocess_keys
(by taking its generic single scan key fast path). Your test case has
2 scan keys, avoiding the fast path, allowing us to reach
_bt_preprocess_array_keys_final().
--
Peter Geoghegan
On Wed, Apr 3, 2024 at 1:04 PM Melanie Plageman
wrote:
> Thanks! And thanks for updating the commitfest entry!
Nice work!
--
Peter Geoghegan
s, but even there it provides
structure that makes things much easier to describe unambiguously.)
--
Peter Geoghegan
ath -- "disobeying" pagefrz.freeze_required creates the
risk that relfrozenxid/relminmxid will be advanced to unsafe values at
the end of the VACUUM. IMV you should stick with that approach now,
even if it is currently safe to do it the other way around.
--
Peter Geoghegan
optimization (at least an implementation along the lines of your
pseudo code). If that was what you intended, then it's not obvious to
me why it is relevant. What, if anything, does it have to do with
making the new checkunique visibility checks happen lazily?
--
Peter Geoghegan
e amount
of pruning performed opportunistically is sometimes much higher than
generally assumed, so having a way of measuring that seems like it
might lead to valuable insights.
--
Peter Geoghegan
pensive other array key infrastructure, and
> only if we're outside the minimum or maximum bounds of the
> non-required scankeys should we trigger advance_array_keys (unless
> scan direction changed).
I've thought about optimizing non-required arrays along those lines.
But it doesn't really seem to be necessary at all.
If we were going to do it, then it ought to be done in a way that
works independent of the trigger condition for array advancement (that
is, it'd work for non-required arrays that have a required scan key
trigger, too).
--
Peter Geoghegan
ent the last proposals could be a real game
> changer for 2 of our biggest databases. We hope that Postgres 17 will contain
> those improvements.
Current plan is to commit this patch in the next couple of weeks,
ahead of Postgres 17 feature freeze.
--
Peter Geoghegan
t failed" errors -- but that's just a symptom. It
just so happens that the hardening added to places like
_bt_swap_posting() and _bt_binsrch_insert() is much more likely to
visibly break than anything else, at least in practice.
--
Peter Geoghegan
flag to indicate whether or not a cleanup lock is needed.
Thanks for confirming.
I realize that this is fairly obvious, but thought it better to ask.
--
Peter Geoghegan
ACUUM record type? Will you preserve
that aspect of the existing design?
--
Peter Geoghegan
On Fri, Mar 8, 2024 at 2:14 PM Peter Geoghegan wrote:
> What benchmarking have you done here?
I think that the memcmp() test is subtly wrong:
> + if (PointerIsValid(rightsep))
> + {
> + /*
> +* Note: we're not in the rightmost page (see bran
rch() (obviously not doable for
non-_bt_first callers, which need to call _bt_binsrch_insert instead).
This whole approach will have been made easier by the refactoring I
did late last year, in commit c9c0589fda.
--
Peter Geoghegan
On Fri, Mar 8, 2024 at 11:00 AM Peter Geoghegan wrote:
> Seems like it might be possible to simplify/consolidate the VM-setting
> code that's now located at the end of lazy_scan_prune. Perhaps the two
> distinct blocks that call visibilitymap_set() could be combined into
> one.
ated at the end of lazy_scan_prune. Perhaps the two
distinct blocks that call visibilitymap_set() could be combined into
one.
--
Peter Geoghegan
t have the
VM's all-frozen bit set for a given block (not the all-visible bit
set) -- which is always wrong. There was good reason to be concerned
about that possibility when 980ae17310 went in.
--
Peter Geoghegan
eprocess_array_keys, but it's kinda the other way around now.
We could probably even get rid of this remaining limited form of
arrayKeyData, but that doesn't seem like it would add much.
--
Peter Geoghegan
On Wed, Mar 6, 2024 at 4:46 PM Matthias van de Meent
wrote:
> On Wed, 6 Mar 2024 at 01:50, Peter Geoghegan wrote:
> > I think that there is supposed to be a closing parenthesis here? So
> > "... (such as those described in ") might
> > perform...".
>
&
lly want to do here is to balance costs and benefits.
That's just what's required. The fact that those costs and benefits
span multiple levels of abstractions makes it a bit awkward, but
doesn't (and can't) change the basic shape of the problem.
--
Peter Geoghegan
more or
less worked as a reference of agreed upon best practices. Can we do
that part first, rather than starting out with a blanket assumption
that everything that happened before now must have been perfect?
--
Peter Geoghegan
ng makes are
false positives. At least in some deeper sense.
--
Peter Geoghegan
ifdef USE_ASSERT_CHECKING
case). It's also just really hard to understand what's going on here.
If I was going to do this kind of thing, I'd use two completely
separate loops, that were obviously completely separate (maybe even
two functions). I'd then memcmp() each array at the end.
--
Peter Geoghegan
move on to the next leaf page
It's possible that we'll need a variety of different strategies.
nbtree already has two such strategies in _bt_killitems(), in a way.
Though its "Modified while not pinned means hinting is not safe" path
(LSN doesn't match canary value path) seems pretty naive. The
prefetching stuff might present us with a good opportunity to replace
that with something fundamentally better.
--
Peter Geoghegan
ork
> for all indexes automatically - but given the current discussion about
> kill_prior_tuple, locking etc. I'm not sure that's really feasible.
>
> The index AM clearly needs to have more control over this.
Cool. I think that that makes the layering question a lot clearer, then.
--
Peter Geoghegan
On Thu, Feb 15, 2024 at 9:36 AM Tomas Vondra
wrote:
> On 2/15/24 00:06, Peter Geoghegan wrote:
> > I suppose that it might be much more important than I imagine it is
> > right now, but it'd be nice to have something a bit more concrete to
> > go on.
> >
>
> This
On Wed, Feb 14, 2024 at 7:28 PM Andres Freund wrote:
> On 2024-02-13 14:54:14 -0500, Peter Geoghegan wrote:
> > This property of index scans is fundamental to how index scans work.
> > Pinning an index page as an interlock against concurrently TID
> > recycling by VACUUM
led analysis. Plus you have problems with things like unlogged
indexes not having an LSN to use as a canary condition, which makes it
a bit messy (it's already kind of weird that we treat unlogged indexes
differently here IMV).
--
Peter Geoghegan
different layers of abstraction, because that's just fundamentally
what's required by the constraints we're operating under. Can't really
prove it, though.
--
Peter Geoghegan
LSNs (like using or not using an
unlogged index) could cause bugs in this area to "accidentally fail to
fail". Since the nbtree index AM has its own optimizations here, which
probably has a tendency to mask problems/bugs. (I sometimes use
unlogged indexes for some of my nbtree related test cases, just to
reduce certain kinds of variability, including variability in this
area.)
[1]
https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/access/nbtree/README;h=52e646c7f759a5d9cfdc32b86f6aff8460891e12;hb=3e8235ba4f9cc3375b061fb5d3f3575434539b5f#l443
--
Peter Geoghegan
otally
unsurprising that breaking kill_prior_tuple in some way could be
missed. Andres wrote the MVCC test in question precisely because
certain aspects of kill_prior_tuple were broken for months without
anybody noticing.
[1] https://www.postgresql.org/docs/devel/index-locking.html
--
Peter Geoghegan
On Fri, Jan 26, 2024 at 11:44 AM Robert Haas wrote:
> Thanks to you for the patches, and to Peter for participating in the
> discussion which, IMHO, was very helpful in clarifying things.
Glad I could help.
--
Peter Geoghegan
el(). So from that point on we should record the
free space in every scanned heap page in the "first heap pass" --
including pages that have LP_DEAD stubs that aren't going to be made
LP_UNUSED in the ongoing VACUUM.
--
Peter Geoghegan
On Mon, Jan 22, 2024 at 4:13 PM Matthias van de Meent
wrote:
> On Fri, 19 Jan 2024 at 23:42, Peter Geoghegan wrote:
> And this is where I disagree with your (percieved) implicit argument
> that this should be and always stay this way.
I never said that, and didn't intend to imply
are (which is
what we call _bt_checkkeys on HEAD, basically).
Note that the _bt_merge_arrays code that you've highlighted isn't
iterating through so->keyData[] -- it is iterating through the
function caller's elements array, which actually come from
so->arrayKeys[].
Like every other Postgres contributor, I do my best to follow the
conventions established by existing code. Sometimes that leads to
pretty awkward results, where CamelCase and underscore styles are
closely mixed together, because it works out to be the most consistent
way of doing it overall.
--
Peter Geoghegan
On Thu, Jan 18, 2024 at 11:46 AM Robert Haas wrote:
> On Thu, Jan 18, 2024 at 11:17 AM Peter Geoghegan wrote:
> > True. But the way that PageGetHeapFreeSpace() returns 0 for a page
> > with 291 LP_DEAD stubs is a much older behavior. When that happens it
> > is literally t
conceding that having a high degree of precision about available free
space isn't actually useful (or wouldn't be useful if it was actually
possible at all). Which is something that I generally agree with. I'd
just like it to be clear that you/Melanie are in fact taking one small
step in that direction. We don't need to discuss possible later steps
beyond that first step. Not right now.
--
Peter Geoghegan
rent design is at least correct on its
own terms. And what you propose to do will probably be less correct on
those same terms, silly though they are.
--
Peter Geoghegan
On Wed, Jan 17, 2024 at 5:47 PM Melanie Plageman
wrote:
>
> On Wed, Jan 17, 2024 at 4:31 PM Peter Geoghegan wrote:
> >
> > On Wed, Jan 17, 2024 at 4:25 PM Peter Geoghegan wrote:
> > > I tend to suspect that VACUUM_FSM_EVERY_PAGES is fundamentally the
> > >
On Wed, Jan 17, 2024 at 4:25 PM Peter Geoghegan wrote:
> I tend to suspect that VACUUM_FSM_EVERY_PAGES is fundamentally the
> wrong idea. If it's such a good idea then why not apply it all the
> time? That is, why not apply it independently of whether nindexes==0
> in the current VACU
good idea then why not apply it all the
time? That is, why not apply it independently of whether nindexes==0
in the current VACUUM operation? (You know, just like with
FAILSAFE_EVERY_PAGES.)
--
Peter Geoghegan
n
> > +-- should terminate, when in reality it should jump backwards to the leaf
> > page
> > +-- that we last visited.
>
> I notice this adds a complex test case that outputs many rows. Can we
> do with less rows if we build the index after data insertion, and with
> a lower (non-default) fillfactor?
Probably not. It was actually very hard to come up with these test
cases, which tickle the implementation in just the right way to
demonstrate that the code in places like _bt_steppage() is actually
required. It took me a rather long time to just prove that much. Not
sure that we really need this. But thought I'd include it for the time
being, just so that reviewers could understand those changes.
--
Peter Geoghegan
On Fri, Jan 12, 2024 at 1:52 PM Melanie Plageman
wrote:
> On Fri, Jan 12, 2024 at 1:07 PM Peter Geoghegan wrote:
> > What is "space_freed"? Isn't that something from your uncommitted patch?
>
> Yes, I was mixing the two together.
An understandable mistake.
> I ju
;. But lazy_scan_noprune doesn't get passed a
pointer to prunestate, so clearly you'll need to detect the same
condition some other way.
--
Peter Geoghegan
the
FSM is up-to-date.
In short, we do these things with the free space map because it is a
map of free space (which isn't crash safe) -- nothing more. I happen
to agree that that general design has a lot of problems, but those
seem out of scope here.
--
Peter Geoghegan
we know that there won't be any second heap pass, and so in both cases
we always call PageGetHeapFreeSpace() in the first heap pass. It's
just that it's a bit harder to see that in the lazy_scan_prune case.
No?
--
Peter Geoghegan
me that some PRUNE record must have
taken care of all that earlier on.
--
Peter Geoghegan
o be very effective in
practice. We only need to have the right *general* idea about which
heap pages to visit -- which heap pages will yield some number of
deletable index tuples.
--
Peter Geoghegan
ts
means.
We could even fully reverse heap page line pointer bloat under this
"transaction rollback" scheme -- I bet that aborted xacts are a
disproportionate source of line pointer bloat. Barring a hard crash,
or a very large transaction, we could "undo" the physical changes to
relations before permitting the backend to retry the transaction from
scratch. This would just work as an optimization.
--
Peter Geoghegan
Will you be in Prague this week? If not this might have to wait.
On Tue, Mar 21, 2023 at 3:37 PM Peter Geoghegan wrote:
> I think that we should do something like the attached, to completely
> avoid this ambiguity. This patch adds a new XLOG_HEAP2 bit that's
> similar to XLOG_HEAP_INIT_PAGE -- XLOG_HEAP2_BYVACUUM. This allows all
> XLOG_HEAP2
place from _bt_first (or
from _bt_endpoint). The first _bt_first call is already kind of
special, in a way that is directly related to this issue. I added some
comments about that to today's commit c9c0589fda, in fact -- I think
it's an important issue in general.
--
Peter Geoghegan
tproof, of course (it's not like older
Postgres versions actually provided _bt_checkkeys with opportunities
to "change its mind" in this sense), but it's a useful starting point
IME. It helps to build intuition.
--
Peter Geoghegan
h weird incomplete opfamilies to actually break the optimization --
that was required to tickle the special case in just the right/wrong
way.
--
Peter Geoghegan
On Tue, Dec 5, 2023 at 8:20 PM Peter Geoghegan wrote:
> On Tue, Dec 5, 2023 at 8:06 PM Alexander Korotkov
> wrote:
> > Thank you for raising this issue. Preprocessing of btree scan keys is
> > normally removing the redundant scan keys. However, redundant scan
> >
t
involves both > and < strategy inequalities, since that makes the
symmetry between the two optimizations clearer.
--
Peter Geoghegan
what you're doing here isn't all that different to
the way that we rely on required equality strategy scan keys being
used to build our initial insertion scan key, that determines where
the scan is initially positioned to within _bt_first. Inequalities
aren't all that different to equalities
On Tue, Dec 5, 2023 at 4:41 PM Peter Geoghegan wrote:
> "In general, when inequality keys are present, the initial-positioning
> code only promises to position before the first possible match, not
> exactly at the first match, for a forward scan; or after the last
> match fo
during the _bt_readpage precheck).
That being said, I wouldn't rule out problems for the precheck
optimization in the presence of opfamilies like the one from my test
case. I didn't get as far as exploring that side of things, at least.
--
Peter Geoghegan
test case highlights. (There were
also issues with NULLs, but AFAICT Alexander dealt with that aspect of
the problem already.)
[1]
https://postgr.es/m/CAH2-Wz=BuxYEHxpNH0tPvo=+g1wte1pamrovu1devow1vy9...@mail.gmail.com
--
Peter Geoghegan
require_opposite_dir_repro.sql
Description: Binary data
On Mon, Dec 4, 2023 at 7:25 PM Peter Geoghegan wrote:
> 2. The optimization that has _bt_checkkeys skip non-required scan keys
> that are *only* required in the direction *opposite* the current scan
> direction -- this can work even without any precheck from
> _bt_readpa
can't mix the array stuff with the
optimization stuff. We won't actually call _bt_checkkeys() in an
assertion when it can cause side-effects.
Assuming that we ultimately conclude that the optimizations *aren't*
worth preserving in any form, it might still be worth making it
obvious that the assertions have no side-effects. But that question is
unsettled right now.
Thanks for the review!
I'll try to get the next revision out soon. It'll also have bug fixes
for mark + restore and for a similar issue seen when the scan changes
direction in just the wrong way. (In short, the array key state
machine can be confused about scan direction in certain corner cases.)
--
Peter Geoghegan
to something on the
stack that will go out of scope, in a way that leaves open the
possibility of reading random stuff from the stack once inside
XLogInsert(). AddressSanitizer's use-after-scope can detect that sort
of thing, though not reliably.
Not at all sure about how feasible any of this is...
--
Peter Geoghegan
On Tue, Nov 28, 2023 at 3:52 PM Peter Geoghegan wrote:
> While I still think that we need heuristics that apply speculative
> criteria to decide whether or not going to the next page directly
> (when we have a choice at all), that doesn't mean that the v7
> heuristics can't
On Tue, Nov 28, 2023 at 9:19 AM Peter Geoghegan wrote:
> > I'm not convinced this is a problem we have to solve. It's possible it
> > only affects cases that are implausible in practice (the script forces a
> > particular scan type, and maybe it would not be picked in practi
se cases. But it's something that
deserves further consideration.
--
Peter Geoghegan
1 - 100 of 3069 matches
Mail list logo