On 21.09.2017 at 23:08, Dan Ackroyd wrote: > On 21 September 2017 at 12:43, Christoph M. Becker <cmbecke...@gmx.de> wrote: > >> There are several bug reports regarding "broken" fputcsv() behavior in >> our tracker, namely, because the $escape parameter causes unexpected >> results. For instance: > > I looked at fixing some of the CSV related bugs about a year or so > ago. My conclusions were: > > i) There is no way to fix the problems that wouldn't cause horrible BC > breaks for code that is only coincidentally working currently. > > ii) Handling strings in C is much more error prone than handling them in PHP. > > I'm reasonably certain that trying to fix the current functions is the > wrong approach, and one of the following would be much better. > > Either, find a C library that has already been proven to handle CSV > parsing/generating 'correctly' and bring that into PHP core under > either new function names or namespace. > > Or, write the code in PHP (or just use > https://github.com/thephpleague/csv) and find a way to make that fast > enough for people to use. > > Touching the existing code is pretty certain to bring a lot of pain, > without resulting in a fully compliant csv parser/generator.
I agree that php_fgetcsv() has serious issues, and it might not be possible to fix it without causing severe BC breaks. php_fputcsv(), on the other hand, is less of a problem, though. Overall, the most demanding issue is that both functions try to regard the current locale, but already fail that generally, since several parameters are declared as char, which can't work for (some) multibyte encodings. For instance, it is impossible to generate proper UTF-16 encoded CSV files, or to read them. This issue continues, because several (mostly?) whitespace characters are hard-coded assuming an ASCII compatible character encoding. A minor issue are the hard-coded record terminators, which are currently LF (RFC 4180 specifies CRLF). Apparently, this isn't a real issue nowadays (besides the missing support for non ASCII compatible character encodings). Another issue concerns the escape character. Frankly, I don't even have the slightest clue how that is supposed to work, and why it even has been introduced in the first place. Maybe it has been introduced for compatibility with some application requiring it; maybe it has been introduced to support "DSV style"[1]. If the latter, at least php_fputcsv() doesn't support it (anymore). Unless it's clear what the escape character exactly is supposed to do, we *cannot* even *hope* to fix the implementation. Introducing new functions with a clearly defined behavior would be nice, but appears to me somewhat as pie in the sky (somebody would have to do the actual work!). But even if we do so, at least the actual behavior of the existing functions would have to be documented. And frankly, I don't see why it would be a problem to allow to use no escape character for fputcsv(). That certainly wouldn't be a BC break, since currently the function bails out if `escape_str_len < 1`. Of course, that wouldn't fix all issues, but it appears to make the function work as expected for ASCII compatible character encodings (for other character encodings the function appears to be broken anyway). Ad league/csv: rather impressive! However, including this functionality into ext/standard is totally over the top, in my opinion. I guess that fgetcsv() and fputcsv() are mostly used for importing from and exporting to CSV, respectively, but not as replacement for an SQL database engine. [1] <http://www.catb.org/~esr/writings/taoup/html/ch05s02.html> -- Christoph M. Becker -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php