[sqlite] json_* functions in sqlite

Eric Rubin-Smith Wed, 22 Apr 2015 14:17:22 -0400

On Tue, Apr 21, 2015 at 9:09 PM, Ashish Sharma <ashishonline at gmail.com>
wrote:


> Hi
>
> Many times I store JSON data in sqlite. It will be useful if sqlite came
> with functions which understand JSON. Presto has a nice set
> https://prestodb.io/docs/current/functions/json.html
>

I wrote a little tool that more or less allows the user to ingest JSON into
an SQLite database, for use in tiny platforms (think a VM running inside a
low-power home router).  I was considering open-sourcing it.

Below is the man page for the tool.  If there is enough interest, I'll do
the requisite work to remove dependencies on other bits of my
infrastructure and publish it somewhere.  Let me know.

Eric

json2sqlite(1)               Aterlo Networks, Inc.
json2sqlite(1)

NAME
       json2sqlite - convert JSON to SQLite database rows

SYNOPSIS
       json2sqlite [OPTION]...

DESCRIPTION
       Read  arbitrary  JSON  from  a  file  (or standard input) and emit
some
       SQLite rows.  The form the rows take is dictated by the options.

       The entire run of the program is wrapped in a single SQLite
transaction
       that is only committed if we do not encounter any critical errors.

       This  program's  original  intent  was  to  be  run on small,
low-power
       devices.  It can be provided a strict memory cap with the -M flag.

   Input selection
       -i, --input-file=FILE
              Read JSON from FILE.  If FILE is '-', then  read  from
standard
              input.  If no value is passed, then '-' is assumed.

              If  you  want to read from a file whose literal name is '-',
you
              can qualify the path to the file, e.g. './-'.

       -P, --input-path=JSON_PATH
              Specify the path of the object you wish to import.

              The root-level JSON element has the  path  '.'.   If  the
root-
              level element is an object, then JSON_PATH may contain a
hierar-
              chy of key names at which the input array  can  be  found.
For
              example,  if  we have JSON {"a": [ <data> ]} then the array
con-
              taining <data> can be imported by saying '--input-path .a'.
The
              leading  dot  may  be omitted.  (This leading-dot syntax is
just
              provided to make addressing the root-level object look a
little
              nicer on the command-line).

              If  left  unspecified,  then  '.'  (i.e.  the whole document)
is
              assumed.

       -T, --input-type=[array|object]
              If 'array' is given, then we import the JSON array that  can
be
              found at the JSON_PATH specified by --input-path.

              When importing an array, it is assumed that each of the
elements
              of the array is itself a JSON object.  These objects may
either
              be  a  simple  name->scalar  mapping, or they may have some
more
              complex nested form.  The objects are usually either imported
as
              one row per object, or one row per name/value pair, depending
on
              the other arguments to this program.

              Otherwise, if 'object' is  given,  then  import  a  single
JSON
              object  whose  path  is the JSON_PATH specified by
--input-path.
              The object is imported just as if it were the unique element
in
              a  containing  array and the array had been specified for
import
              via --input-type=array.

              If left unspecified, then 'array' is assumed.

   Output Selection
       -o, --output-database=DATABASE_FILENAME
              Emit rows to the database file DATABASE_FILENAME.  This
argument
              is mandatory.

       -t, --output-table=TABLE_NAME
              Emit rows to the given TABLE_NAME.  This argument is
mandatory.

   Initialization
       -s, --schema-file=SCHEMA_FILE
              Run  the  SQL  statements  in  SCHEMA_FILE before doing
anything
              else.

       -D, --delete-first
              Run 'DELETE FROM <output-table>;' before ingesting  any
records
              (but still within the process's global transaction, so that
sub-
              sequent failures will not lead to data loss).

   Transforming input to output
       Note that the input selection mechanism you chose above  with
--input-
       type  and  --input-path  has given us a list of objects (where the
list
       has length 0, 1, or more).  This is called the 'initial result set'
in
       the  below.   The following options instruct json2sqlite how to
convert
       each of those objects to some list of table rows.  There are two
over-
       all  structures  that  each  object  can have (nested or flat), and
two
       overall approaches for  converting  those  structures  to  SQLite
rows
       (either one row per name-value pair or one row per object).

       Input structure specification

       You  must  indicate  to json2sqlite the structure of the objects in
the
       initial result set.  If you don't specify one, then --flat is
assumed.

       -F, --flat
              Each object in the initial result set should be presumed to
con-
              tain a simple listing of name-value pairs, where each value
is a
              scalar.

              For example, if the overall JSON is:

                {
                  "foo": "bar",
                  "zippy": 123
                }

              then it's appropriate to pass --flat.

       -N, --nested
              Each object in the initial result set should be presumed to
map
              a  name  to  either  an  object or an array of objects.  Each
of
              these subobjects is itself presumed to be a  simple  listing
of
              name-value pairs.

              For example, if the overall JSON is:

                {
                  "foo": {
                    "mean": 106.5,
                    "stddev": 8.5
                  },
                  "zippy": {
                    "mean": 108.1,
                    "stddev": 2.5
                  }
                }

              and  you  wish  to  ingest this whole data set, then --nested
is
              appropriate.  Here is another example of a nested structure
that
              you can import:

                {
                  "foo": [
                    {
                      "weight": 107,
                      "unit": "grams",
                      "timestamp": 1422812035
                    },
                    {
                      "weight": 117,
                      "unit": "grams",
                      "timestamp": 1422725653
                    }
                  ],
                  "bar": [
                    {
                      "weight": 57,
                      "unit": "grams",
                      "timestamp": 1422812035
                    },
                    {
                      "weight": 83,
                      "unit": "grams",
                      "timestamp": 1422725653
                    }
                  ]
                }

              Note  that  passing  --nested  essentially  expands  the
initial
              result set to contain potentially a larger number of objects.

              When --nested is passed, the object key becomes available in
the
              output structure specification, addressable as '_PARENT_KEY_'.

       Output structure specification

       Now  you  must  choose whether to emit one SQLite row per object in
the
       result set, or one SQLite row per name/value pair in the objects of
the
       result set.

       If  you don't specify one of the two following options, then
'--row-per
       object' is assumed.

       -r, --row-per=object
              Each object in the result set is converted to  one  SQLite
row.
              The  keys in the result set object correspond to column names
in
              the SQLite output rows, and the values in the result set
corre-
              spond to the values in the SQLite output rows.

       -r, --row-per=key
              Each  name/value  pair  in each object in the result set is
con-
              verted to one SQLite row.  The key and  the  value  both
become
              data  in  the  SQLite table, and the SQLite column names must
be
              specified separately.  The key is addressable in the output
map
              specification  (see below) as '_KEY_', and the value is
address-
              able as '_VALUE_'.

       Finally, you must specify how to translate between the  'column'
names
       in the JSON result set and the SQLite column names.

       -m, --column-map='J1 S1 J2 S2 ... Jn Sn'
              This  argument  is  mandatory.  When --row-per is 'object',
then
              map JSON key name Ji in the result set to SQLite column Si.
If
              '--nested' was passed, then Ji can also have the sentintel
value
              '_PARENT_KEY_', in which case column Si is  populated  with
the
              key parent object of the present object in the result set.

              When --row-per is 'key', then Ji may be one of:

                _PARENT_KEY_
                _KEY_
                _VALUE_

              and  Si  is  the column name to which to map Ji.  Ji may also
be
              the name of a key, in which case the value  Si  is  set  to
the
              value for that key within the object currently being
traversed.

              If  json2sqlite encounters an object in the result set that
does
              not have one of the keys specified in the column  map,  then
it
              will  attempt  to  set  the corresponding SQLite column to
NULL.
              (Note well that this is identical behavior to the case where
the
              object  contains  the key and the key is mapped to the JSON
null
              value.)  This will fail if the SQLite schema does not not
allow
              this.   At  the  moment,  json2sqlite considers this a
permanent
              failure in the input data set, rolls back the whole
transaction,
              and exits with an unhappy status.

   Miscellaneous
       -M, --memory-cap=BYTES
              Do  not consume more than BYTES bytes of heap.  If more space
is
              required to hold the whole document and/or write  to  the
data-
              base,  then  json2sqlite emits an error message and exits
with a
              special status described below.  The  default  is  10
megabytes
              (10000000).  Use 0 to specify no limit.

              At  the  moment, 75% of the memory cap is apportioned to
storing
              the JSON document (the whole of which is read into RAM prior
to
              any other processing), and 25% is apportioned to the SQLite
page
              cache.

       --trace
              Emit  a  trace  (to  stderr)  of  the  SQLite  statements
that
              json2sqlite executes.

       -h, --help
              Print help and quit.

       -V, --version
              Print version information and quit.

EXIT STATUS
       json2sqlite exits with one of the following statuses:

       0      OK.   All records were successfully ingested with no warnings
or
              errors.

       1      Unhandled exception.  This indicates a bug in json2sqlite.

       10     Input error.  The JSON document was malformed and could  not
be
              parsed.

       11     Input  error.   The  JSON did not follow the format specified
by
              the command-line arguments.

       12     Input error.  The JSON document could  not  be  completely
read
              because  the  input  datasource was truncated, or the socket
was
              closed, or so on.

       13     Input error.  The input file could not be opened for reading.

       14     Input error.  Could not find the JSON_PATH specified by
--input-
              path.

       18     Out  of  memory.  The JSON document could not be completely
read
              because json2sqlite ran out of memory.

       25     SQLite error.  SQLite complained about something, so we
couldn't
              complete the transaction.

       26     Failed to read the SQLite schema file.

       100    Bad command-line arguments.

EXAMPLES
   Example 1 - a single, flat object
       Suppose you have a JSON document like this:

         {
            "first_name": "Alice",
            "last_name": "Doe",
            "age": 37,
            "height": 180,
            "height_units": "cm"
         }

       and  you  would like to ingest this as one row per person into the
same
       database:

         CREATE TABLE person(first_name, last_name, age, height,
height_units);

       Then issue the following command:

         cat json |
           json2sqlite --input-type object     \
                       --output-database my.db \
                       --output-table person   \
                       --column-map '
                         first_name   first_name
                         last_name    last_name
                         age          age
                         height       height
                         height_units height_units
                       '

   Example 2 - an array of flat objects
       Suppose the document in Example 1 is now  an  array  with  many
person
       objects  in  it.   And  this time, the names in your input document
are
       different from the names in your SQLite database.  Finally, the
object
       is in some nested location in a larger document:

         {
             "demographics": {
                 "persons": [
                     {
                        "name": "Alice",
                        "last_name": "Doe",
                        "age_years": 37,
                        "height": 180,
                        "height_units": "cm"
                     },
                     {
                        "name": "Bob",
                        "last_name": "Johnson",
                        "age_years": 24,
                        "height": 172,
                        "height_units": "cm"
                     }
                 ]
             }
         }

       and  you  would like to ingest this as one row per person into the
same
       database:

         CREATE TABLE person(first_name, last_name, age, height,
height_units);

       Then issue the following command:

         cat json |
           json2sqlite --input-type array                 \
                       --input-path .demographics.persons \
                       --output-database my.db            \
                       --output-table person              \
                       --column-map '
                         name         first_name
                         last_name    last_name
                         age_years    age
                         height       height
                         height_units height_units
                       '

   Example 3 - arbitrary name-value pairs in a flat JSON object
       Suppose you have a set of name-value pairs arriving as a flat JSON
doc-
       ument:

         {
             "screen_resolution_horiz": 640,
             "screen_resolution_vert": 480,
             "cpu_mhz": 1434,
             "vendor": "Dell, Inc.",
             "product_name": "Latitude",
             "product_id": "E6510",
             "price": 500,
             "price_units": "USD"
         }

       and  you  wish  to  store this information in a table that is a
simple,
       opaque name-value pair set:

         CREATE TABLE nvpairs(name TEXT PRIMARY KEY, value NOT NULL);

       Then issue the following command:

         cat json |
           json2sqlite --output-database my.db         \
                       --output-table nvpairs          \
                       --input-type object             \
                       --row-per key                   \
                       --column-map '
                         _KEY_   name
                         _VALUE_ value
                       '
       This will yield the following rows:

         screen_resolution_horiz,640
         screen_resolution_vert,480
         cpu_mhz,1434
         vendor,"Dell, Inc."
         product_name,Latitude
         product_id,E6510
         price,500
         price_units,USD

   Example 4 - arbitrary name-value pairs with a parent key
       Building on Example 3, consider a similar JSON document, but with
many
       computer  models  represented, each one with its own model ID as a
pri-
       mary key:

         {
             "1034023432": {
                 "screen_res_horiz": 640,
                 "screen_res_vert": 480,
                 "cpu_mhz": 1434,
                 "vendor": "Dell, Inc.",
                 "product_name": "Latitude",
                 "product_id": "E6510",
                 "price": 500,
                 "price_units": "USD"
             },
             "4589734534": {
                 "screen_res_horiz": 1024,
                 "screen_res_vert": 768,
                 "cpu_mhz": 5000,
                 "vendor": "Dell, Inc.",
                 "product_name": "Inspiron",
                 "product_id": "5600",
                 "price": 750,
                 "price_units": "USD"
             },
             etc
         }

       and this time your schema maps a (model  ID,  attribute)  pair  to
its
       value:

         CREATE TABLE nvpairs(
             model_id TEXT NOT NULL,
             name TEXT NOT NULL,
             value NOT NULL,
             PRIMARY KEY(model_id, name)
         );

       Here is how you'd ingest that data set:

         cat json |
           json2sqlite --output-database my.db         \
                       --output-table nvpairs          \
                       --input-type object             \
                       --nested                        \
                       --row-per key                   \
                       --column-map '
                         _PARENT_KEY_ model_id
                         _KEY_        name
                         _VALUE_      value
                       '

       This will emit the following records:

         INSERT INTO nvpairs VALUES('1034023432','screen_res_horiz',640);
         INSERT INTO nvpairs VALUES('1034023432','screen_res_vert',480);
         INSERT INTO nvpairs VALUES('1034023432','cpu_mhz',1434);
         INSERT INTO nvpairs VALUES('1034023432','vendor','Dell, Inc.');
         INSERT INTO nvpairs VALUES('1034023432','product_name','Latitude');
         INSERT INTO nvpairs VALUES('1034023432','product_id','E6510');
         INSERT INTO nvpairs VALUES('1034023432','price',500);
         INSERT INTO nvpairs VALUES('1034023432','price_units','USD');
         INSERT INTO nvpairs VALUES('4589734534','screen_res_horiz',1024);
         INSERT INTO nvpairs VALUES('4589734534','screen_res_vert',768);
         INSERT INTO nvpairs VALUES('4589734534','cpu_mhz',5000);
         INSERT INTO nvpairs VALUES('4589734534','vendor','Dell, Inc.');
         INSERT INTO nvpairs VALUES('4589734534','product_name','Inspiron');
         INSERT INTO nvpairs VALUES('4589734534','product_id','5600');
         INSERT INTO nvpairs VALUES('4589734534','price',750);
         INSERT INTO nvpairs VALUES('4589734534','price_units','USD');

BUGS
       Please submit bug reports to json2sqlite at aterlo.com.

       The  memory  cap is probably not enforced particularly well for
certain
       kinds of documents.  For example, you could probably get json2sqlite
to
       consume  perhaps  1.5  times its memory cap by providing an object
with
       single enormous key as input.  For the moment,  we  are  not  going
to
       sweat this.

       The design obviously limits json2sqlite's capabilities to certain
kinds
       of JSON documents, and the authors welcome clean  ways  of
handling  a
       wider variety of document forms.

ACKNOWLEDGEMENTS
       json2sqlite  is  based on the lightweight JSON parser 'jsmn', for
which
       we acknowledge and thank zserge; and of course SQLite,  for  which
the
       world will forever be indebted to D. Richard Hipp.

COPYRIGHT
       json2sqlite   is  Copyright  (c)  Aterlo  Networks,  Inc.   All
rights
       reserved.

Aterlo Networks, Inc.             1 Feb 2015
json2sqlite(1)

[sqlite] json_* functions in sqlite

Reply via email to