On Mon, Jul 13, 2020 at 7:38 PM John Cowan <[email protected]> wrote: > > Just cherry-picking a few points... > > On Mon, Jul 13, 2020 at 5:40 PM Linas Vepstas <[email protected]> > wrote: > > Compare to, for example SQL -- it blows the doors off syntax-case in >> usability and power. >> > > Well, no; syntax-case allows arbitrary Scheme, so it is Turing-complete. > SQL is not, unless the implementation of CTEs allows arbitrary nesting. > SQL is also extremely rigid, deficient, and un-orthogonal compared to a > true relational algebra implementation like Tutorial D. > > See also Linda, in which you broadcast arbitrary tuples (could be trees, > too) into Lindaspace and then anyone can query the space with pattern > matching, returning the first matching tuple with or without atomically > removing it. >
Yes, SQL is deficient, which is why graph query languages exist, and why the atomspace got created. To keep things concrete, here's a bio-grid/reactome/chebi data annotation package: https://github.com/MOZI-AI/annotation-scheme -- its currently being used for covid research. Typical datasets contain something approx 10 million s-expressions, e.g. a million of these biogrid's: (Evaluation (Predicate "interacts_with") (List (Gene "FLNC") (Gene "MAP2K4"))) (Evaluation (Predicate "has_entrez_id") (List (Gene "MAP2K4") (Concept "entrez:6416"))) several million of these chebi's (Member (Molecule "ChEBI:16977") (Concept "SMP0000055")) (Evaluation (Predicate "has_name") (List (Molecule "ChEBI:16977") (Concept "(2S)-2-aminopropanoic acid"))) etc. Basically, they are small, very low-complexity patterns, just that there's a lot of them. two heavy-hitter queries include what I call "the triangle": given gene A, find genes B and C such that A interacts with B interacts with C interacts with A. (They've intentionally confused upregulation with downregulation for some reason I don't understand). Another is that I call the "pentagon": genes A and B interact, they express proteins P and Q, which are in the same reactome R. The triangle queries currently take maybe an hour(?) on a five-year-old compute node; the pentagon queries take maybe 6 hours(?) (I've forgotten.) So, as a point of practical application: can I load 10 million relations into Tutorial D or into Linda, and run the triangle/pentagon pattern matches? (I don't see how to use either syntax-case or how to use srfi-200 to perform these queries. Or rather, I haven't thought it worthy to devote time to figure out how to do this, as they don't seem appropriate for this problem.) (I admit I've never heard of Tutorial D or Linda before, will look.) --linas
