Package: libsqlite3-0
Version: 3.40.1-2+deb12u1
Severity: normal
Tags: upstream
X-Debbugs-Cc: [email protected]

Apparently it is impossible to import data from a JSON tree that
contains an array of records, each record of which contains another
array. That’s the consequence of the bug. The bug is essentially that
tables resulting from a clause formed as “json_each((select …),'$')”
are incomplete because only the first row of input is processed by
json_each.

For a concrete example, take this JSON:

 $ curl -s 
'https://lemmy-federate.com/api/instance.find?input=%7B%22search%22%3A%22%22%2C%22skip%22%3A0%2C%22take%22%3A6%2C%22enabledOnly%22%3Afalse%7D'
 > lemmy-federate_nodes.json

This is how we might expect the innermost arrays to be imported:

===8<----------------------------------------
#!/bin/bash

sqlite3 -echo demo.db <<'EOF'
create table if not exists relationTbl (
  id            integer primary key,
  community_id  integer,
  created       text,
  updated       text,
  status        text,
  error_reason  text,
  attempt_count text,
  hostname      text,
  host_id       text);

insert or replace into relationTbl select
  json_extract(value,'$.id'),
  json_extract(value,'$.communityId'),
  json_extract(value,'$.createdAt'),
  json_extract(value,'$.updatedAt'),
  json_extract(value,'$.status'),
  json_extract(value,'$.errorReason'),
  json_extract(value,'$.attemptCount'),
  json_extract(value,'$.instance.host'),
  json_extract(value,'$.instance.id')
from json_each((select json_extract(value,'$') from 
json_tree(readfile('lemmy-federate_communities.json'),'$.result.data.communities')
where json_tree.type = 'array' and fullkey like '%follows'));

EOF
===8<----------------------------------------

The sample code executes and superficially appears to function, but
running the following query reveals that only the first array is
imported:

===8<----------------------------------------
  $ sqlite3 demo.db "select distinct community_id from relationTbl"
===8<----------------------------------------

Six rows are expected from that query, but there is only one. This is
somewhat dangerous because no error is produced and *some* data is
imported. So the failure is not readily detectable, which can lead to
data loss because someone might then delete their JSON source after
importing.

Other experiments reveal that json_tree has the same defect of only
processing the first item of input when the input is another
query. Other experiments also reveal an extraction defect when a JSON
array is extracted from a column (this will be filed in a separate
bug report).

Bug 2:

It’s worth noting as well that the possibility of feeding a (select …)
query into json_each and json_tree is undocumented. Perhaps that’s
intentional. I don’t know if SQlite has a requirements spec and if
json_tree and json_each are included. One document (perhaps old) said
these functions are “external” and not part of sqlite. Yet they are
included in my stock version of sqlite and the most recent version of
the docs (3.50.3) cover their existence -- just not the possibility of
a query as input which I intuitively discovered by experimentation.

Workaround:

There is no apparent workaround within the confines of the sqlite library. But 
it’s possible to loop on the
parallel arrays outside of sqlite and feed them back into sqlite, as follows:

===8<----------------------------------------
while read json_array
do
    sqlite3 demo.db <<EOF
insert or replace into LFrelationTbl select
  json_extract(value,'$.id'),
  json_extract(value,'$.communityId'),
  json_extract(value,'$.createdAt'),
  json_extract(value,'$.updatedAt'),
  json_extract(value,'$.status'),
  json_extract(value,'$.errorReason'),
  json_extract(value,'$.attemptCount'),
  json_extract(value,'$.instance.host'),
  json_extract(value,'$.instance.id')
from json_each('$json_array')
EOF
done <<< $(sqlite3 'file:///tmp?mode=memory' "select 
json_extract(value,'$.follows')
                                              from 
json_each(readfile('lemmy-federate_communities.json'),'$.result.data.communities');")
===8<----------------------------------------

-- System Information:
Debian Release: 12.11
  APT prefers stable-updates
  APT policy: (990, 'stable-updates'), (990, 'stable-security'), (990, 
'stable'), (500, 'oldstable')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 5.10.0-28-amd64 (SMP w/2 CPU threads)
Kernel taint flags: TAINT_OOT_MODULE, TAINT_UNSIGNED_MODULE
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE not set
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Versions of packages libsqlite3-0 depends on:
ii  libc6  2.36-9+deb12u10

libsqlite3-0 recommends no packages.

libsqlite3-0 suggests no packages.

-- no debconf information

Reply via email to