Some quick comments:

On 12/03/2024 15:02, Thomas Munro wrote:
src/backend/storage/aio/streaming_read.c
src/include/storage/streaming_read.h

Standard file header comments missing.

It would be nice to have a comment at the top of streaming_read.c, explaining at a high level how the circular buffer, lookahead and all that works. Maybe even some diagrams.

For example, what is head and what is tail? Before reading the code, I assumed that 'head' was the next block range to return in pg_streaming_read_buffer_get_next(). But I think it's actually the other way round?

/*
 * Create a new streaming read object that can be used to perform the
 * equivalent of a series of ReadBuffer() calls for one fork of one relation.
 * Internally, it generates larger vectored reads where possible by looking
 * ahead.
 */
PgStreamingRead *
pg_streaming_read_buffer_alloc(int flags,
                                                           void *pgsr_private,
                                                           size_t 
per_buffer_data_size,
                                                           BufferAccessStrategy 
strategy,
                                                           
BufferManagerRelation bmr,
                                                           ForkNumber forknum,
                                                           
PgStreamingReadBufferCB next_block_cb)

I'm not a fan of the name, especially the 'alloc' part. Yeah, most of the work it does is memory allocation. But I'd suggest something like 'pg_streaming_read_begin' instead.

Do we really need the pg_ prefix in these?

Buffer
pg_streaming_read_buffer_get_next(PgStreamingRead *pgsr, void **per_buffer_data)

Maybe 'pg_streaming_read_next_buffer' or just 'pg_streaming_read_next', for a shorter name.



        /*
         * pgsr->ranges is a circular buffer.  When it is empty, head == tail.
         * When it is full, there is an empty element between head and tail.  
Head
         * can also be empty (nblocks == 0), therefore we need two extra 
elements
         * for non-occupied ranges, on top of max_pinned_buffers to allow for 
the
         * maxmimum possible number of occupied ranges of the smallest possible
         * size of one.
         */
        size = max_pinned_buffers + 2;

I didn't understand this explanation for why it's + 2.

        /*
         * Skip the initial ramp-up phase if the caller says we're going to be
         * reading the whole relation.  This way we start out doing full-sized
         * reads.
         */
        if (flags & PGSR_FLAG_FULL)
                pgsr->distance = Min(MAX_BUFFERS_PER_TRANSFER, 
pgsr->max_pinned_buffers);
        else
                pgsr->distance = 1;

Should this be "Max(MAX_BUFFERS_PER_TRANSFER, pgsr->max_pinned_buffers)"? max_pinned_buffers cannot be smaller than MAX_BUFFERS_PER_TRANSFER though, given how it's initialized earlier. So perhaps just 'pgsr->distance = pgsr->max_pinned_buffers' ?

--
Heikki Linnakangas
Neon (https://neon.tech)


Reply via email to