On 21.09.2017 at 23:08, Dan Ackroyd wrote:

> On 21 September 2017 at 12:43, Christoph M. Becker <cmbecke...@gmx.de> wrote:
>
>> There are several bug reports regarding "broken" fputcsv() behavior in
>> our tracker, namely, because the $escape parameter causes unexpected
>> results.  For instance:
> 
> I looked at fixing some of the CSV related bugs about a year or so
> ago. My conclusions were:
> 
> i) There is no way to fix the problems that wouldn't cause horrible BC
> breaks for code that is only coincidentally working currently.
> 
> ii) Handling strings in C is much more error prone than handling them in PHP.
> 
> I'm reasonably certain that trying to fix the current functions is the
> wrong approach, and one of the following would be much better.
> 
> Either, find a C library that has already been proven to handle CSV
> parsing/generating 'correctly' and bring that into PHP core under
> either new function names or namespace.
> 
> Or, write the code in PHP (or just use
> https://github.com/thephpleague/csv) and find a way to make that fast
> enough for people to use.
> 
> Touching the existing code is pretty certain to bring a lot of pain,
> without resulting in a fully compliant csv parser/generator.

I agree that php_fgetcsv() has serious issues, and it might not be
possible to fix it without causing severe BC breaks.  php_fputcsv(), on
the other hand, is less of a problem, though.

Overall, the most demanding issue is that both functions try to regard
the current locale, but already fail that generally, since several
parameters are declared as char, which can't work for (some) multibyte
encodings.  For instance, it is impossible to generate proper UTF-16
encoded CSV files, or to read them.  This issue continues, because
several (mostly?) whitespace characters are hard-coded assuming an ASCII
compatible character encoding.

A minor issue are the hard-coded record terminators, which are currently
LF (RFC 4180 specifies CRLF).  Apparently, this isn't a real issue
nowadays (besides the missing support for non ASCII compatible character
encodings).

Another issue concerns the escape character.  Frankly, I don't even have
the slightest clue how that is supposed to work, and why it even has
been introduced in the first place.  Maybe it has been introduced for
compatibility with some application requiring it; maybe it has been
introduced to support "DSV style"[1].  If the latter, at least
php_fputcsv() doesn't support it (anymore).  Unless it's clear what the
escape character exactly is supposed to do, we *cannot* even *hope* to
fix the implementation.

Introducing new functions with a clearly defined behavior would be nice,
but appears to me somewhat as pie in the sky (somebody would have to do
the actual work!).  But even if we do so, at least the actual behavior
of the existing functions would have to be documented.

And frankly, I don't see why it would be a problem to allow to use no
escape character for fputcsv().  That certainly wouldn't be a BC break,
since currently the function bails out if `escape_str_len < 1`.  Of
course, that wouldn't fix all issues, but it appears to make the
function work as expected for ASCII compatible character encodings (for
other character encodings the function appears to be broken anyway).

Ad league/csv: rather impressive!  However, including this functionality
into ext/standard is totally over the top, in my opinion.  I guess that
fgetcsv() and fputcsv() are mostly used for importing from and exporting
to CSV, respectively, but not as replacement for an SQL database engine.

[1] <http://www.catb.org/~esr/writings/taoup/html/ch05s02.html>

-- 
Christoph M. Becker

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to