Hi,

First time messing around with PHP internal code and mailing lists, so sorry if I'm in the wrong place or doing something wrong in general.


I recently noticed some weird performance issues while doing bulk inserts with prepared statements (single INSERT with a lot of VALUES) and using RETURNING clause to get back IDs and other columns.

So I wrote a little benchmark to insert 8000 random rows (3 columns each) into a table and spent some time tracking down why it's slow. Suprisingly it seems that INSERT itself takes 100-200ms, but fetch/fetchAll returning id and one of the columns takes 2-3 seconds.

I'm sending the benchmark script and postgres database schema used.


After digging around PHP source code (pulled master branch), the problem seems to be in PDO calling param_hook with PDO_PARAM_EVT_FETCH_PRE and again PDO_PARAM_EVT_FETCH_POST  for each fetched row, which causes param_hook to be executed for each row x each param twice. In my little benchmark inserting 8000 rows with 3 columns and returning 2 columns for each row that means param_hook is called 8000x3x8000x2 = 384 000 000 times! So I took a look at pgsql_stmt_param_hook in ext/pdo_pgsql/pgsql_statement.c and it doesn't seem to do anything for PDO_PARAM_EVT_FETCH_PRE or PDO_PARAM_EVT_FETCH_POST. So if my understanding is correct, it's calling a function that does nothing meaningful 384 000 000 times, and this number grows exponentially with the number of rows and columns.

I'm using postgres, but looking at ext/pdo_mysql code this seems to also be the case for mysql's drivers, didn't benchmark it though.


As a test I commented out the lines dispatching those events in ext/pdo/pdo_stmt.c:

|diff --git a/ext/pdo/pdo_stmt.c b/ext/pdo/pdo_stmt.c||
||index 96f7574638..49703a7d68 100644||
||--- a/ext/pdo/pdo_stmt.c||
||+++ b/ext/pdo/pdo_stmt.c||
||@@ -592,9 +592,9 @@ static int do_fetch_common(pdo_stmt_t *stmt, enum pdo_fetch_orientation ori, zen||
||                return 0;||
||        }||
||||
||-       if (!dispatch_param_event(stmt, PDO_PARAM_EVT_FETCH_PRE)) {||
||-               return 0;||
||-       }||
||+       // if (!dispatch_param_event(stmt, PDO_PARAM_EVT_FETCH_PRE)) {||
||+       //      return 0;||
||+       // }||
||||
||        if (!stmt->methods->fetcher(stmt, ori, offset)) {||
||                return 0;||
||@@ -605,9 +605,9 @@ static int do_fetch_common(pdo_stmt_t *stmt, enum pdo_fetch_orientation ori, zen||
||                return 0;||
||        }||
||||
||-       if (!dispatch_param_event(stmt, PDO_PARAM_EVT_FETCH_POST)) {||
||-               return 0;||
||-       }||
||+       // if (!dispatch_param_event(stmt, PDO_PARAM_EVT_FETCH_POST)) {||
||+       //      return 0;||
||+       // }||
||||
||        if (do_bind && stmt->bound_columns) {||
||                /* update those bound column variables now */|

|
|

After this change, fetching takes ~5ms and nothing seems broken, but the whole INSERT/RETURNING is now 10 times faster.


Am I understanding this right? Could this be solved by letting each pdo_* driver set some kind of a flag (bitmask?) telling PDO which hooks it wants called from dispatch_param_event? For example for pdo_pgsql that flag would be |PDO_PARAM_EVT_FREE | PDO_PARAM_EVT_NORMALIZE | PDO_PARAM_EVT_ALLOC | PDO_PARAM_EVT_EXEC_PRE|, as other case statements seem empty.

<<attachment: bench_pdo.php>>

Attachment: test-schema.sql
Description: application/sql

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Reply via email to