Re: SRFI development in the age of git

Linas Vepstas Tue, 14 Jul 2020 16:57:50 -0700

Just skimmed Tutorial D. The knee-jerk reaction is to be completely
dismissive -- It appears to be absolutely stone-age in its conception. I'm
not sure why you are pointing at it -- computer science has come a very
long ways since that era. If there's any gold in there, it's not evident.

Linda is more interesting but seems super-minimalistic, a kind kind of a
mashup-up of a string language plus variable unification. The point here is
that a "tuple" is a "string of symbols" and, although strings are
sufficient for computation (e.g. simply-typed lambda calculus is a "string
language") one gains a lot of expressive power and simplicity by
abstraction first to terms (i.e. to term algebras & model theory) and then
abstracting to graphs.

The point is that it's simply easier to think in terms of graphs (e.g. the
biology example of gene A upregulating gene B downregulating gene C) than
it is to think in terms of tuples (e.g. there exists a tuple (R, G, G')
where R is a predicate from the set {"up or downregulate"} and G,G' are of
type "gene" and we want to perform a string substitution (R,A,?X).(R',?X,C)
over an explicitly-named variable having the name ?X which is constrained
to vary over all members having type "gene".  (which BTW, is what sparql,
gremlin, and many of the nastier graph query languages force you to do))
Thinking in terms of tuples is hard work. Being forced to explicitly think
about variable unification, which Linda seems to force you to do, is also
hard work.  Graphs, with user-defined types, multi-edges are just ...
easier. (cue children's sing-song "the knee-bone is connected to the
thigh-bone...")

That said, one encounters a whole new zoo of interesting issues, but enough
for now.

--linas

On Tue, Jul 14, 2020 at 6:12 PM John Cowan <[email protected]> wrote:

> Tutorial D is a language.  The implementations that exist are basically
> for reference and educational use, and don't focus on performance.
>
> See <
> https://en.wikipedia.org/wiki/Linda_(coordination_language)#Implementations>
> for implementations of Linda embedded in various languages.  JavaSpaces is
> probably the best known and has a definite niche.
>
>
> On Tue, Jul 14, 2020 at 6:54 PM Linas Vepstas <[email protected]>
> wrote:
>
>>
>>
>> On Mon, Jul 13, 2020 at 7:38 PM John Cowan <[email protected]> wrote:
>>
>>>
>>> Just cherry-picking a few points...
>>>
>>> On Mon, Jul 13, 2020 at 5:40 PM Linas Vepstas <[email protected]>
>>> wrote:
>>>
>>> Compare to, for example SQL -- it blows the doors off syntax-case in
>>>> usability and power.
>>>>
>>>
>>> Well, no; syntax-case allows arbitrary Scheme, so it is
>>> Turing-complete.  SQL is not, unless the implementation of CTEs allows
>>> arbitrary nesting.  SQL is also extremely rigid, deficient, and
>>> un-orthogonal compared to a true relational algebra implementation like
>>> Tutorial D.
>>>
>>> See also Linda, in which you broadcast arbitrary tuples (could be trees,
>>> too) into Lindaspace and then anyone can query the space with pattern
>>> matching, returning the first matching tuple with or without atomically
>>> removing it.
>>>
>>
>> Yes, SQL is deficient, which is why graph query languages exist, and why
>> the atomspace got created. To keep things concrete, here's a
>> bio-grid/reactome/chebi data annotation package:
>> https://github.com/MOZI-AI/annotation-scheme -- its currently being used
>> for covid research.
>>
>> Typical datasets contain something approx 10 million s-expressions, e.g.
>> a million of these biogrid's:
>> (Evaluation (Predicate "interacts_with") (List (Gene "FLNC") (Gene
>> "MAP2K4")))
>> (Evaluation (Predicate "has_entrez_id") (List (Gene "MAP2K4") (Concept
>> "entrez:6416")))
>>
>> several  million of these chebi's
>> (Member (Molecule "ChEBI:16977") (Concept "SMP0000055"))
>> (Evaluation (Predicate "has_name") (List (Molecule "ChEBI:16977")
>> (Concept "(2S)-2-aminopropanoic acid")))
>> etc.
>>
>> Basically, they are small, very low-complexity patterns, just that
>> there's a lot of them.
>>
>> two heavy-hitter queries include what I call "the triangle": given gene
>> A, find genes B and C such that A interacts with B interacts with C
>> interacts with A. (They've intentionally confused upregulation with
>> downregulation for some reason I don't understand).  Another is that I call
>> the "pentagon": genes A and B interact, they express proteins P and Q,
>> which are in the same reactome R.
>>
>> The triangle queries currently take maybe an hour(?) on a five-year-old
>> compute node; the pentagon queries take maybe 6 hours(?) (I've forgotten.)
>> So, as a point of practical application: can I load 10 million relations
>> into Tutorial D or into Linda, and run the triangle/pentagon pattern
>> matches? (I don't see how to use either syntax-case or how to use srfi-200
>> to perform these queries. Or rather, I haven't thought it worthy to devote
>> time to figure out how to do this, as they don't seem appropriate for this
>> problem.)
>>
>> (I admit I've never heard of Tutorial D or Linda before, will look.)
>>
>> --linas
>>
>>
>>

-- 
Verbogeny is one of the pleasurettes of a creatific thinkerizer.
        --Peter da Silva

Re: SRFI development in the age of git

Reply via email to