Hi,
First time messing around with PHP internal code and mailing lists, so sorry if I'm in the wrong place or doing something wrong in general.
I recently noticed some weird performance issues while doing bulk inserts with prepared statements (single INSERT with a lot of VALUES) and using RETURNING clause to get back IDs and other columns.
So I wrote a little benchmark to insert 8000 random rows (3 columns each) into a table and spent some time tracking down why it's slow. Suprisingly it seems that INSERT itself takes 100-200ms, but fetch/fetchAll returning id and one of the columns takes 2-3 seconds.
I'm sending the benchmark script and postgres database schema used.After digging around PHP source code (pulled master branch), the problem seems to be in PDO calling param_hook with PDO_PARAM_EVT_FETCH_PRE and again PDO_PARAM_EVT_FETCH_POST for each fetched row, which causes param_hook to be executed for each row x each param twice. In my little benchmark inserting 8000 rows with 3 columns and returning 2 columns for each row that means param_hook is called 8000x3x8000x2 = 384 000 000 times! So I took a look at pgsql_stmt_param_hook in ext/pdo_pgsql/pgsql_statement.c and it doesn't seem to do anything for PDO_PARAM_EVT_FETCH_PRE or PDO_PARAM_EVT_FETCH_POST. So if my understanding is correct, it's calling a function that does nothing meaningful 384 000 000 times, and this number grows exponentially with the number of rows and columns.
I'm using postgres, but looking at ext/pdo_mysql code this seems to also be the case for mysql's drivers, didn't benchmark it though.
As a test I commented out the lines dispatching those events in ext/pdo/pdo_stmt.c:
|diff --git a/ext/pdo/pdo_stmt.c b/ext/pdo/pdo_stmt.c|| ||index 96f7574638..49703a7d68 100644|| ||--- a/ext/pdo/pdo_stmt.c|| ||+++ b/ext/pdo/pdo_stmt.c||||@@ -592,9 +592,9 @@ static int do_fetch_common(pdo_stmt_t *stmt, enum pdo_fetch_orientation ori, zen||
|| return 0;|| || }|| |||| ||- if (!dispatch_param_event(stmt, PDO_PARAM_EVT_FETCH_PRE)) {|| ||- return 0;|| ||- }|| ||+ // if (!dispatch_param_event(stmt, PDO_PARAM_EVT_FETCH_PRE)) {|| ||+ // return 0;|| ||+ // }|| |||| || if (!stmt->methods->fetcher(stmt, ori, offset)) {|| || return 0;||||@@ -605,9 +605,9 @@ static int do_fetch_common(pdo_stmt_t *stmt, enum pdo_fetch_orientation ori, zen||
|| return 0;|| || }|| |||| ||- if (!dispatch_param_event(stmt, PDO_PARAM_EVT_FETCH_POST)) {|| ||- return 0;|| ||- }|| ||+ // if (!dispatch_param_event(stmt, PDO_PARAM_EVT_FETCH_POST)) {|| ||+ // return 0;|| ||+ // }|| |||| || if (do_bind && stmt->bound_columns) {|| || /* update those bound column variables now */| | |After this change, fetching takes ~5ms and nothing seems broken, but the whole INSERT/RETURNING is now 10 times faster.
Am I understanding this right? Could this be solved by letting each pdo_* driver set some kind of a flag (bitmask?) telling PDO which hooks it wants called from dispatch_param_event? For example for pdo_pgsql that flag would be |PDO_PARAM_EVT_FREE | PDO_PARAM_EVT_NORMALIZE | PDO_PARAM_EVT_ALLOC | PDO_PARAM_EVT_EXEC_PRE|, as other case statements seem empty.
<<attachment: bench_pdo.php>>
test-schema.sql
Description: application/sql
-- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: https://www.php.net/unsub.php