On 04.02.20 15:42, Simon Slavin wrote:
On 4 Feb 2020, at 12:18pm, Robert M. Münch <robert.mue...@saphirion.com> wrote:

- sep=';': field separator character (different from default ',')
If you provide this facility, please don't add it to anything called 'csv' 
since the 'c' stands for 'comma'.

For those playing along at home, csv files using semi-colon are a result of a 
bug in Excel.  Windows has a setting for a 'list separator'.  The two most 
usual values are ',' and ';'.  The CSV export filter in Excel takes its 
separator from this field rather than always using a comma, because it was 
written by someone who wasn't aware of, didn't understand, or was intentionally 
trying to disrupt the standard.  Decades after being told about the bug, 
Microsoft hasn't fixed it.

There are a couple of other errors in Excel's CSV filters including how strings 
are quoted and how a blank value differs from a zero-length string.  The best 
way I've seen to handle this was to add a new filter to your software, similar 
to 'csv', called something like 'exceltext' which did things the Excel way.

Believe it or not, there is no binding standard for the CSV format. The closest anyone has come was RFC 4180.
However:

According to RFC 4180, section 2:
  "While there are various specifications and implementations for the
   CSV format (for ex. [4], [5], [6] and [7]), there is no formal
   specification in existence, which allows for a wide variety of
   interpretations of CSV files."

https://tools.ietf.org/html/rfc4180#section-2

In section 3, under "Interoperability considerations":
  "Due to lack of a single specification, there are considerable
   differences among implementations.  Implementors should "be
   conservative in what you do, be liberal in what you accept from
   others" (RFC 793 [8]) when processing CSV files."

https://tools.ietf.org/html/rfc4180#section-3

That being said, the problem with trying to enforce the comma as the sole delimiter character is due to the fact that over half of the non-English speaking world (or perhaps even more) uses the comma as the decimal separator. The "work-around" for that, of course, would be to enclose all fields in double quote characters. But, as we know, the 800-pound gorilla in the room doesn't necessarily do that...

I agree that this would be a very good option to have. In the meantime, check out libcsv on GitHub:
https://github.com/rgamble/libcsv

It adheres as closely to what standards there are, and you can choose your own delimiter and quote character if you like. Of course, you have to do some programming to use it, but it's really easy to use. And it is very fast since it does just one thing, but does it very well.

HTH,
Bob Hairgrove

_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to