Package: libsqlite3-0 Version: 3.40.1-2+deb12u1 Severity: normal Tags: upstream X-Debbugs-Cc: [email protected]
Apparently it is impossible to import data from a JSON tree that contains an array of records, each record of which contains another array. That’s the consequence of the bug. The bug is essentially that tables resulting from a clause formed as “json_each((select …),'$')” are incomplete because only the first row of input is processed by json_each. For a concrete example, take this JSON: $ curl -s 'https://lemmy-federate.com/api/instance.find?input=%7B%22search%22%3A%22%22%2C%22skip%22%3A0%2C%22take%22%3A6%2C%22enabledOnly%22%3Afalse%7D' > lemmy-federate_nodes.json This is how we might expect the innermost arrays to be imported: ===8<---------------------------------------- #!/bin/bash sqlite3 -echo demo.db <<'EOF' create table if not exists relationTbl ( id integer primary key, community_id integer, created text, updated text, status text, error_reason text, attempt_count text, hostname text, host_id text); insert or replace into relationTbl select json_extract(value,'$.id'), json_extract(value,'$.communityId'), json_extract(value,'$.createdAt'), json_extract(value,'$.updatedAt'), json_extract(value,'$.status'), json_extract(value,'$.errorReason'), json_extract(value,'$.attemptCount'), json_extract(value,'$.instance.host'), json_extract(value,'$.instance.id') from json_each((select json_extract(value,'$') from json_tree(readfile('lemmy-federate_communities.json'),'$.result.data.communities') where json_tree.type = 'array' and fullkey like '%follows')); EOF ===8<---------------------------------------- The sample code executes and superficially appears to function, but running the following query reveals that only the first array is imported: ===8<---------------------------------------- $ sqlite3 demo.db "select distinct community_id from relationTbl" ===8<---------------------------------------- Six rows are expected from that query, but there is only one. This is somewhat dangerous because no error is produced and *some* data is imported. So the failure is not readily detectable, which can lead to data loss because someone might then delete their JSON source after importing. Other experiments reveal that json_tree has the same defect of only processing the first item of input when the input is another query. Other experiments also reveal an extraction defect when a JSON array is extracted from a column (this will be filed in a separate bug report). Bug 2: It’s worth noting as well that the possibility of feeding a (select …) query into json_each and json_tree is undocumented. Perhaps that’s intentional. I don’t know if SQlite has a requirements spec and if json_tree and json_each are included. One document (perhaps old) said these functions are “external” and not part of sqlite. Yet they are included in my stock version of sqlite and the most recent version of the docs (3.50.3) cover their existence -- just not the possibility of a query as input which I intuitively discovered by experimentation. Workaround: There is no apparent workaround within the confines of the sqlite library. But it’s possible to loop on the parallel arrays outside of sqlite and feed them back into sqlite, as follows: ===8<---------------------------------------- while read json_array do sqlite3 demo.db <<EOF insert or replace into LFrelationTbl select json_extract(value,'$.id'), json_extract(value,'$.communityId'), json_extract(value,'$.createdAt'), json_extract(value,'$.updatedAt'), json_extract(value,'$.status'), json_extract(value,'$.errorReason'), json_extract(value,'$.attemptCount'), json_extract(value,'$.instance.host'), json_extract(value,'$.instance.id') from json_each('$json_array') EOF done <<< $(sqlite3 'file:///tmp?mode=memory' "select json_extract(value,'$.follows') from json_each(readfile('lemmy-federate_communities.json'),'$.result.data.communities');") ===8<---------------------------------------- -- System Information: Debian Release: 12.11 APT prefers stable-updates APT policy: (990, 'stable-updates'), (990, 'stable-security'), (990, 'stable'), (500, 'oldstable') Architecture: amd64 (x86_64) Foreign Architectures: i386 Kernel: Linux 5.10.0-28-amd64 (SMP w/2 CPU threads) Kernel taint flags: TAINT_OOT_MODULE, TAINT_UNSIGNED_MODULE Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE not set Shell: /bin/sh linked to /usr/bin/dash Init: systemd (via /run/systemd/system) LSM: AppArmor: enabled Versions of packages libsqlite3-0 depends on: ii libc6 2.36-9+deb12u10 libsqlite3-0 recommends no packages. libsqlite3-0 suggests no packages. -- no debconf information

