At 18:38 01/08/2015, you wrote:

>Nobody mentions it because it is as irrelevant as bemoaning the fact 
>that CSV cannot store lawn-chairs or Java objects. It wasn't intended 
>to do so.

Exactly. All I mean is that with only very few additional strict rules 
it can be changed into a basic type compliant vehicle able to reliably 
transfer data between many applications, including SQLite. A 
spreadsheet internally differentiates between a string, a number and an 
empty cell, for every cell, so the text output format should clearly do 
the same, unambiguously IMHO.

>   Neither, for that matter, does it store Integers or Reals as you go 
> on to mention - It is completely typeless (moreso than SQLite).

You know very well that SQLite is far from being typeless. What you put 
in you get out thru the kaleidoscope of affinities. Its storage classes 
are perfectly defined and obey precise rules.

>  It stores only one single thing: Strings. It has only one single 
> guide: How to correctly add /the string/ to a row and column and how 
> to read it back. How you interpret those strung-together characters 
> is up to the composer/decomposer (as Simon mentioned) - the CSV 
> standard has no feelings about it.

True, but the issue with most variants of CSV files floating around is 
just that: it's the reader to decide if 12345.4890E-13 is a float or a 
string (for instance the reference of some item by a given supplier.) 
By forcing string delimiters, you gain at least SQLite type resilience 
in/out. If you follow the RFC by the letter and only use string 
delimiters when a string actually contains the delimiter, you loose the 
capacity to unambiguously determine basic types.

By adopting a set of simple rules, you can reliably import/export data 
blindly into/from SQL (and outside as well) without loosing basic type 
information even at the individual field level. Of course here I mean 
all SQLite datatypes, which is what we're talking about in this list.

>For extra fun - How must a value that are both in and not in quotes be 
>interpreted? i.e if I have this csv line, what values must the parser 
>end up with?:
>
>1, "2", "3" 4, 5 "6", 7
              ^-------------------- ??? (fixed font required)
An error there: this isn't valid CSV under all variants I know of. At 
the limit, the grammar can be enhanced (or bastardized?) to allow for 4 
being considered a comment, but I never had any incentive to handle 
this kind of cosmetic variation. Even in that case, then "6" should 
also being seen as a comment but this is going far beyond what I (and 
most people) need and routinely use.


>i.e. You've made your own file specification using the CSV standard as 
>a kick-off point.

Exactly: I don't pretend to have invented warm water, just decided for 
an easy to implement variant which fixes most of the issues with CSV 
being initially type-blind.

Additionally, I enforce UTF8 no BOM encoding and allow unsignificant 
Unicode horizontal whitespaces before and after field separators as 
well as at begin and end of line.

Obviously if one needs to use a format offering more complete 
semantics, then JSON or XML are there for use, albeit at significantly 
higher cost.

Reply via email to