Hi Georg, Thank you for addressing my questions.
Regarding the progress indicator: it looks to me like it will be very difficult to implement properly in PHP, but I might be completely misunderstanding the design. If you think a PoC is possible, could you please prepare one and maybe then we can come back to this discussion. ---------------------------- >There is a fundamental difference between what a server might do (Version) and >what it can do (Capabilities). A version string is a static label. Capability >flags, however, reflect the server’s current build-time features and runtime >configuration. > >Exposing ext_server_capabilities allows developers and frameworks to implement >reliable feature detection. Without these flags, a developer cannot safely use >functions without risking a fatal error or needing to wrap every call in a >try-catch block. This is exactly what exceptions and errors are for. The PHP user SHOULD NOT be concerned with what capabilities the server offers and which are compatible with PHP. Even if it could be potentially useful to PHP developers, we should not expose this information as part of mysqli API. Either mysqli supports the feature or it doesn't; there should be no maybe. ---------------------------- >The performance improvement is not derived from the Generator itself, but from >the reduction of network round-trips. > >- standard foreach + execute(): 1,000 rows = 1,000 "Send" packets + 1,000 >"Response" packets. >- execute_many(): The driver pulls rows from the iterable and serializes them >into a large internal network buffer. 1,000 rows = 1 large Send packet + 1 >Response packet. > >Regarding iterator_to_array($generator): This forces PHP to allocate memory >for every single row simultaneously. If you are importing 1 million rows, PHP >must allocate memory for 1 million arrays before the first byte is ever sent. >By using the Generator directly, we maintain O(1) memory usage, fetching and >serializing only one row at a time. If mysqlnd is going to serialize it all into a large internal buffer then it still needs to read all the data from the iterator before making the query? It's still going to use the same amount of memory. How is it going to maintain constant memory usage? A generator might be a neat trick for users to prepare data on the go, but it doesn't reduce the memory usage if the consumer needs to use all of the data in one go. As I understand, execute_many will send it all in one batch, so the memory footprint stays the same. ---------------------------- Re execute_many parameters: Despite your convincing arguments for better network utilization by providing the types, I still think we should not offer the possibility of specifying the types. I don't know what other PHP developers on this mailing list think about it, but for me the type feature goes against the nature of PHP. Making the parameter optional is very good choice and eases my concerns slightly, so if I am outnumbered in my opinion, I won't be upset. The number of mysqli users grows increasingly smaller. Out of this, the number of people who will use execute_many and who will need to optimize for TINYINT is unbelievably tiny. Any string easily overshadows the numerical data. Thus, this feature won't see much legitimate use. > Because the MariaDB bulk protocol requires type declarations in the packet > header before the data rows are sent, the driver cannot "autodetect" binary > widths from a Generator without reading the entire stream into memory first. > Providing the $types string acts as a contract that allows for true, > constant-memory streaming. Isn't that what it does anyway? You need to read all the data (serialize) before you make the EXECUTE command, correct? I don't understand why you can't prepare the type specification automatically while serializing the data. Regarding the control parameter: Why not make it a callback? Provide the $row as an argument and let the user modify it inside the callback, substituting values for mysqli_indicator, and returning the row to be inserted. It would offer a lot more flexibility to the user and would make the implementation simpler. This way, you don't need to implement Null or None anymore. ---------------------------- >>Please be explicit what the failure conditions may be. > >Like for all other functions you have to retrieve error messages via >mysqli_stmt::error/errno. I was asking you to list in the RFC all the possible client errors that are added as part of this implementation. For example, "Row %lu is not an array". This should be part of the RFC, in my opinion, as we may want to discuss the error conditions and messages too. ---------------------------- Result sets: > As mentioned in the beginning of the RFC, execute_many is used for UPSERT > (insert, replace,delete and update) operations. However it might contain a > result set, e.g. by having a RETURNING clause. It can also be used with SELECT. The SQL is irrelevant. >From what I can see, it is possible to request that MariaDB reply with individual result sets. Wouldn't that be better? How would that impact the performance? > >If you require the IDs of every row in a bulk insert, the MariaDB RETURNING >clause should be used: That's not what I meant. I was asking whether it could be implemented with MARIADB_CLIENT_BULK_UNIT_RESULTS instead. When users execute a SELECT with 2 data rows, I would like to see it return 2 mysqli_result objects. Same with INSERT statements, it should return a separate result for each insertion. If there are arguments against that, they should be explained in the RFC. Anyway, the RFC should clearly explain how result fetching works with all 3 methods (unbuffered, stored, and get_result) and what are the possible gotchas. ---------------------------- > For transactional engines like InnoDB, atomicity is guaranteed. If a protocol > error occurs or a constraint is hit mid-batch, the server handles the > rollback, ensuring the database remains in a consistent state. > Transaction Safety & Atomicity: In native MariaDB bulk mode, the entire batch > is sent as a single unit. In the fallback emulation, rows are executed one by > one. For non-transactional engines, a failure on row 500 would leave the > first 499 rows committed. To maintain consistency, we should recommend that > users wrap execute_many() in an explicit transaction when portability across > MySQL and MariaDB is required. I am confused. Aren't both of these statements stating the same? Why can't you wrap the fallback in an automatic transaction to make it work exactly the same as the native MariaDB solution? If execute_many implies an automatic transaction but only with transactional engines, it should be clearly stated in the RFC so that it can be later documented in PHP manual too. ---------------------------- > Limited Indicator Support: Since the standard MySQL COM_STMT_EXECUTE protocol > does not understand MariaDB-specific indicators, the fallback will only > support mysqli_indicator::Null (translated to a standard SQL NULL) and > mysqli_indicator::None. Indicators like DEFAULT and IGNORE are technically > impossible to implement in the fallback without complex SQL string > manipulation/rewriting, which would introduce unacceptable CPU overhead. And for this reason, I think that maybe we shouldn't even implement the control parameter at all. It sounds like a neat feature, but it costs performance in a function that is all about improving performance, and it is DB-version specific. The new execute_many function doesn't need the control parameter to function properly, and in my opinion, it would be better to keep it as simple as possible. But I am curious to see what others think. Regards, Kamil
