DBD, rows and memory pools

Steven Fosdick Fri, 13 Mar 2020 18:19:01 -0700

I have just been writing a small example program using APR-util's DBD
with Oracle as the database and have come across an interesting
puzzle.  My first attempt used just one pool and didn't attempt to
clear it but then, when I tried to introduce sensible lifetimes, I
ended up with a program including this fragment:


apr_pool_t *spool;
st = apr_pool_create(&spool, pool);
if (st == APR_SUCCESS) {
    apr_pool_t *rpool;
    st = apr_pool_create(&rpool, spool);
    if (st == APR_SUCCESS) {
        apr_dbd_results_t *res;
        st = apr_dbd_select(drv, spool, dbh, &res, "SELECT * FROM mytab", 0);
        if (st == 0) {
            int cols = apr_dbd_num_cols(drv, res);
            apr_dbd_row_t *row;
            while (!apr_dbd_get_row(drv, rpool, res, &row, -1)) {
                for (int c = 0; c < cols; c++) {
                    const char *name = apr_dbd_get_name(drv, res, c);
                    const char *value = apr_dbd_get_entry(drv, row, c);
                    printf("%s=%s\n", name, value);
                }
                apr_pool_clear(rpool);
                putchar('\n');
            }
        }
        apr_pool_clear(spool);
    }
}
apr_dbd_close(drv, dbh);

This core dumps.  For fetching the first row, by luck, the variable
'row' was NULL so apr_dbd_get_row allocated it's internal row
structure from the pool rpool, initialised it and set row to point to
it.  By the time the second row is being fetched and rpool has been
cleared apr_dbd_get_row finds the pointer to the apr_dbd_row_t it
allocated last time and tries to re-use it which works for the first
column but, after the call to apr_dbd_get_entry, this is then
overwritten and processing the 2nd column causes the core dump.

I can obviously resolve this by initialising row to NULL each time:

apr_pool_t *rpool;
st = apr_pool_create(&rpool, spool);
if (st == APR_SUCCESS) {
    apr_dbd_results_t *res;
    st = apr_dbd_select(drv, spool, dbh, &res, "SELECT * FROM hn_lookup_rt", 0);
    if (st == 0) {
        int cols = apr_dbd_num_cols(drv, res);
        apr_dbd_row_t *row = NULL;
        while (!apr_dbd_get_row(drv, rpool, res, &row, -1)) {
            for (int c = 0; c < cols; c++) {
                const char *name = apr_dbd_get_name(drv, res, c);
                const char *value = apr_dbd_get_entry(drv, row, c);
                printf("%s=%s\n", name, value);
            }
            putchar('\n');
            apr_pool_clear(rpool);
            row = NULL;
        }
    }
    apr_pool_clear(spool);
}

but clearly using the pass by reference of 'row' and using it as a
in/out parameter and thus re-using the previously allocated
apr_dbd_row_t is intended as an optimisation to avoid re-doing that
allocation and initialisation for each row.   It does however seem to
me to have the wrong lifetime.  If I was to change this example so it
was:

apr_pool_t *spool;
st = apr_pool_create(&spool, pool);
if (st == APR_SUCCESS) {
    apr_dbd_results_t *res;
    st = apr_dbd_select(drv, spool, dbh, &res, "SELECT * FROM hn_lookup_rt", 0);
    if (st == 0) {
        int cols = apr_dbd_num_cols(drv, res);
        apr_dbd_row_t *row = NULL;
        while (!apr_dbd_get_row(drv, spool, res, &row, -1)) {
            for (int c = 0; c < cols; c++) {
                const char *name = apr_dbd_get_name(drv, res, c);
                const char *value = apr_dbd_get_entry(drv, row, c);
                printf("%s=%s\n", name, value);
            }
            putchar('\n');
        }
    }
    apr_pool_clear(spool);
}

that would allow the allocated apr_dbd_row_t to be cached between rows
but this now has a different problem because apr_dbd_get_entry is
allocating the returned strings from a memory pool that now lasts for
the whole of the iteration over the result set.  That's fine for a
small table but if iterating over millions of rows this would consume
a lot of memory.

It seems to me that if there is some data to be retained between
fetching one row and the next it should really be allocated from the
memory pool given to apr_dbd_select, not the one given to
apr_dbd_get_row.  What do you think?

DBD, rows and memory pools

Reply via email to