On Wed, Apr 10, 2024 at 5:21 PM Andres Freund <and...@anarazel.de> wrote: > > Hi, > > On 2024-04-10 16:50:44 -0400, Melanie Plageman wrote: > > This brings up a question about the prefetching. We never had to have > > this discussion for sequential scan streaming read because it didn't > > (and still doesn't) do prefetching. But, if we push the streaming read > > code down into the heap AM layer, it will be doing the prefetching. > > So, do we remove the prefetching from acquire_sample_rows() and expect > > other table AMs to implement it themselves or use the streaming read > > API? > > The prefetching added to acquire_sample_rows was quite narrowly tailored to > something heap-like - it pretty much required that block numbers to be 1:1 > with the actual physical on-disk location for the specific AM. So I think > it's pretty much required for this to be pushed down. > > Using a read stream is a few lines for something like this, so I'm not worried > about it. I guess we could have a default implementation for block based AMs, > similar what we have around table_block_parallelscan_*, but not sure it's > worth doing that, the complexity is much lower than in the > table_block_parallelscan_ case.
This makes sense. I am working on pushing streaming ANALYZE into heap AM code, and I ran into a few roadblocks. If we want ANALYZE to make the ReadStream object in heap_beginscan() (like the read stream implementation of heap sequential and TID range scans do), I don't see any way around changing the scan_begin table AM callback to take a BufferAccessStrategy at the least (and perhaps also the BlockSamplerData). read_stream_begin_relation() doesn't just save the BufferAccessStrategy in the ReadStream, it uses it to set various other things in the ReadStream object. callback_private_data (which in ANALYZE's case is the BlockSamplerData) is simply saved in the ReadStream, so it could be set later, but that doesn't sound very clean to me. As such, it seems like a cleaner alternative would be to add a table AM callback for creating a read stream object that takes the parameters of read_stream_begin_relation(). But, perhaps it is a bit late for such additions. It also opens us up to the question of whether or not sequential scan should use such a callback instead of making the read stream object in heap_beginscan(). I am happy to write a patch that does any of the above. But, I want to raise these questions, because perhaps I am simply missing an obvious alternative solution. - Melanie