I don't think that patch to apply affinities is a good idea since it will
usually be mostly useless and will negatively impact performance since one can
and should assume that the actual author of the VTable knows what they are
doing, currently SQLite3 does not enforce declared VTable column affinities and
it should stay that way. That said, although the definitions of the column
affinities are stored in the internal schema, it is not trivial to access them
from a non-builtin extension.
Instead I have once again changed the vsv.c VTable plugin so that it does some
more useful things.
The below parameters work exactly as they did in the csv.c extension with
regards to setting the functionality with the single exception that now if you
specify all of header, schema and columns, the header row will still be
skipped. In csv.c if you specified both schema and columns then the header row
was not skipped even if you specified it existed.
filename=FILENAME is the csv filename in quotes.
data=DATA provides the data inline.
header=BOOL will "parse" the first row to get the column names unless a
SCHEMA is specified.
schema=STRING allows the specification of the schema to use
columns=N number of columns in VTable
New parameters:
skip=N
will skip the specified number of rows at the start of the CSV file, exclusive
of the header row (ie, skip data rows). If the header, via the header option,
is specified to exist then it will always be skipped.
fsep=SEP and rsep=SEP
allow to specify the particular separators to be used. The defaults are
fsep=',' and rsep='\n' so the default is the same as the original. The
separator can be any single character, or it can be a supported escape code (\t
\v \n \f for horizontal-tab, vertical-tab, newline and form-feed), or it can be
a hexadecimal coded byte as in '\x1f'. Multibyte separators cannot be
specified.
affinity=AFF
where the affinity can be NONE, BLOB, TEXT, INTEGER, REAL, NUMERIC with the
default being NONE. NONE does nothing different from the existing csv.c VTable
(unless you also specify validatetext). However, if you specify an affinity it
applies to *every* returned value and makes the following changes:
- BLOB returns all data as BLOB
- TEXT returns all data as TEXT
- INTEGER returns anything that looks like an integer as an INTEGER otherwise
as TEXT
- REAL returns anything that looks like a number as a REAL (double) otherwise
as TEXT
- NUMERIC returns anything that looks like a number as either an INTEGER or
REAL otherwise as TEXT
- if the field "looks like an integer" then an INTEGER is returned
- if the field "looks like a number" then a REAL is returned unless it can
be converted to an integer
"looks like an integer" effectively means passing the following regex "^
*[+-]?\d+ *$"
"looks like a number" effectively means passing the following regex "^
*[+-]?(\d+\.?\d*|\d*\.?\d+)([eE][+-]?\d+)? *$"
Processing is limited by the runtime C library so the detection is not quite so
good as the affinities built into the SQLite3 core. This means that something
that looks like an integer may fail to load as an integer because it is too big
(918273745985736365575984857636253857564363 looks like an integer and the
SQLite3 core will recognize this and load it as a double. The C library is not
so smart and will probably simply return MAX_INT).
Processing is also limited by the compiler. Some compilers (wave at Microsoft)
do not treat long double as being more precise than a regular double (ie, 10 or
16 bytes rather than 8) and treat that simply as a plain old double (some
compilers, like Intel, need a compilation option to enable long doubles being
actually longer than a regular double). It is detected if this is the case
(sizeof(long dounble)==sizeof(double)) so that only "integers" that can be
entirely contained in the mantissa are converted to integer, and others will
remain as double.
validatetext=BOOL
will cause TEXT fields to be validated as containing a valid UTF-8 coding
sequence (no content check is performed, only the validity of the encoding is
checked). If this is turned on together with any type of affinity (ie, other
than none) then improperly encoded UTF8 text will be returned as a BLOB. If no
affinity is in effect (ie, affinity=none) then an error will be thrown rather
then letting a text field contain invalid data. This means that fields
containing embedded nulls (rather than encoded 0 bytes) will not be able to be
stored as TEXT and will either be BLOB type or throw an invalid encoding error.
nulls=BOOL
when enabled empty fields will be returned as NULL rather than empty strings.
An empty field means a field with no content (separators are side-by-each).
Specifically empty strings are not changed. That is, the middle column in this
a,,b is a null column but this is not a,"",b
--
The fact that there's a Highway to Hell but only a