Re: Querying a tuple database with minikanren

Amirouche Boubekki Fri, 23 Dec 2016 03:03:07 -0800

I will give a talk at fosdem about my work on wiredtiger 
<https://fosdem.org/2017/schedule/event/guilewiredtiger/> and talk a little
about this microkanren integration. If you wish to discuss stuff related 
to mini/micro kanren at FOSDEM make sure to drop by Guile dev room :)


Best!

Amirouche aka. amz3

On Sunday, November 29, 2015 at 3:27:55 PM UTC+1, Amirouche Boubekki wrote:
>
> Héllo,
>
>
> I am interested in building database both for the purpose of exploring
> datasets like conceptnet or wikibase and building webapps.
>
> You might be wondering how this relates to minikanren. Well, there's
> two sides:
>
> * First, you might want to store logic facts inside database persisted
>   to disk because your dataset doesn't fit into memory
>
> * Second, you might want to take advantage of the declarative nature
>   of minikanren programs
>
> For this exercise I've chosen to work on tuple database so called
> (uid, attribute, identifier) aka. uav database, because it's the
> database that is the most flexible and can span various usecases from
> relational, graph or hierarchical problems to key/value or document
> based kind of problems. And it can be loosely typed. It's similar in
> principle to datomic EAV except that it doesn't have an audit trail
> ie. it is not immutable.
>
> The backend storage is handled by wiredtiger [0], it's a ordered
> key/value store, a hashmap similar to leveldb or Oracle BerkeleyDB.
>
> To picture how it looks like imagine a giant list of (u a v) tuples
> like the following, that stores messages from a blogging application:
>
> (define database '(("uid1" type post)
>                    ("uid1" blog/post "uav db is a tuple db")
>                    ("uid1" blog/create-at 0)
>
>                    ("uid2" type tag)
>                    ("uid2" tag/name "scheme")
>
>                    ("uid3" type taglink)
>                    ("uid3" taglink/tag "uid2")
>                    ("uid3" taglink/post "uid1")))
>
> In wiredtiger, `uid` and `attribute` compose the *key*, and the
> `value` unique *value* column of the entry.
>
> There is only three primitives that can be run against such database:
>
> * `(uav-ref uid attribute)` retrieve the value associated with
>   `(UID . ATTRIBUTE)` pair.
>
> * `(uav-ref* uid)` retrieve all the `(attribute . value)` pairs
>   associated with `UID` aka. retrieve the document with `UID` as
>   unique identifier.
>
> * `(uav-index-ref attribute value)` retrieve all `uid` for which a
>   `(uid ATTRIBUTE VALUE)` tuple exists in the database. This is kind
>   of a reverse lookup. I don't call it as such because it's not the
>   only "reverse" lookup that can be made.
>
> Next, I translate `uav-ref` and`uav-index-ref` procedures into
> minikanren procedures that makes it possible to solve logic problems
> backed by a database and then explain the higher level interface
> that use a pattern matching-like interface to query the tuples.
>
> To simplify let's use on single procedure, `(queryo uid attribute
> value)` that calls the correct underlying `uav-*` procedure and does
> the correct *substitutions* mangling with `disj` and `conj`. To
> query all blog posts that have the a given tag, one can use the
> following minikanren query:
>
> (define (query:posts-with-tags name)
>   (run* (uid?)
>     (fresh (tag?? taglink??)
>       (queryo tag?? 'tagname name)
>       (queryo taglink?? 'tag/link tag??)
>       (queryo taglink?? 'tag/post uid?))))
>
> The above query translates in plain english line by line as:
>
> * Given a tag with uid `tag??` and `tag/name` equal to the value `name`
> * Find a tuple with `taglink??` uid where `'tag/link` is associated with 
> `tag??`
> * Such as there is also a tuple with `taglink??` uid where `'tag/post`
>   is associated with `uid?`
>
> And return `uid?`.
>
> `queryo` is implemented in terms of `uav-refo` and `uav-index-refo`:
>
> (define queryo
>   (lambda (u a v)
>     (lambda (s/c)
>       (let ((u (walk u (car s/c)))
>             (v (walk v (car s/c))))
>         (cond
>          ((var? u) ((uav-index-refo u a v) s/c))
>          ((var? v) ((uav-refo u a v) s/c))
>          ((throw 'uav "unsupported query type")))))))
>
> This doesn't support at least, the case where both `u` and `v` are
> values. What it does is lookup the value associated with the mk
> variables `u` and `v` and return themself if their are no mk variables
> thanks to `walk`. It's similar to how `==` works through
> `unify`.
>
> `queryo` is really a syntax helper to make it possible to use a single
> procedure which has the look of a UAV tuple.
>
> The difference with `==` is that `==` is the end of the story,
> there is no further processing. In `queryo` down the flow calls
> to `disj` and `conj` happen.
>
> Let's start simple, `(uav-refo u a v?)` expects only `v?` to be a
> variable and is defined in terms of `==`:
>
> (define uav-refo
>   (lambda (uid attribute value?)
>     (== value? (uav-ref uid attribute))))
>
> It says that `value?` must be equal to the output of `uav-ref`.
>
> This is the first step into making minikanren work with the database,
> but this is not enough. `uav-refo` is useful to check *constraints*
> where usually in a database query one need to "create" values and then
> check constraint against.
>
> To create values we use `uav-index-refo`, remember `uav-index-ref` is
> the kind of reverse lookup that allows to retrieve uids that we can later
> constraint using `uav-refo`.
>
> `(uav-index-ref attribute value)` will return several uids for the
> same *constraint* so this is a hint that a `disj` is happening. The `disj`
> procedure defined in microkanren [1] takes only two arguments and `disj+`
> is a macro. So I defined the following `disj*`:
>
> (define (disj* . gs)
>   (if (null? gs) (lambda (s/c) (list))
>       (disj (car gs) (apply disj* (cdr gs)))))
>
> Now we can define `uav-index-refo`:
>
> (define findo
>   (lambda (uid? attribute value)
>     (let ((uids (uav-index-ref attribute value)))
>       (apply disj* (map (lambda (uid) (== uid uid?)) uids)))))
>
> This implement that every result of `uav-index-ref` creates its own
> constraint space with a different uid.
>
> Now everything is in place to do queries using minikaren, it also
> works with recursive queries without having to add anything else.
>
> To sum up, I translated the database primitives to minikanren.
>
> On top of this I defined a macro that allows to have tuple pattern 
> matching.
>
> The pattern matching interface is less powerful that the raw
> minikanren interface in the sens that it doesn't allow so far to
> create recursive queries but allows to express SQL queries in a terse
> and nice syntax.
>
> For instance, to do the same query as above, one can use the following 
> query:
>
> (define (query:posts-with-tag* name)
>  (query* uid? where ((tag?? 'tag/name name)
>                      (taglink?? 'tag/link tag??)
>                      (taglink?? 'tag/post uid?))))
>
> The where clause declare the pattern that should be looked for. The
> symbols that ends with a single question mark e.g. `tag?` are return
> variables. symbols that ends with two question marks e.g. `taglink??`
> are internal variables matched during querying.  (In the underlying
> implementation `taglink??` is a minikanren variable instantiated with
> `fresh`)
>
> Here is the macro:
>
> (define-syntax query*
>   (lambda (x)
>     (define (is-not-free-var? sym)
>       (let ((sym (symbol->string sym)))
>         (eq? (string-rindex sym #\?) (- (string-length sym) 1))))
>     (define (make-vars ctx names tuples)
>       (let* ((names (map syntax->datum names))
>              (vals (map syntax->datum tuples))
>              (syms (filter symbol? (concatenate vals)))
>              (syms (filter is-not-free-var? syms))
>              (syms (delete-duplicates syms))
>              (vars (fold (lambda (x out)
>                            (remove (cut equal? x <>) out))
>                          syms
>                          names)))
>         (datum->syntax ctx vars)))
>     
>     (syntax-case x (where)
>       ((query* names ... where ((u a v) ...))
>        (with-syntax ((vars (make-vars x #'(names ...) #'((u a v) ...))))
>          #'(reify-all (lambda (names ...)
>                         (fresh vars
>                            ((queryo *context*) u a v) ...))
>                       'names ...))))))
>
> A few notes:
>
> * Here `queryo` takes a `*context*` argument because in my code, database 
> primitives
>   all take a database context argument. I'm not sure how to deal with that 
> I think the
>   best interface for this would be something like `(query* *context* query 
> arguments)`
>   but I couldn't achieve that. Using a fluid is another solution maybe.
>
> * I also rely on a `reify-all` procedure not defined in microkanren, here 
> is it:
>
> (use-modules (srfi srfi-1))
> (use-modules (srfi srfi-26))
>
>
> (define (run goal . vars)
>   (take-all
>    (call/goal (apply goal vars))))
>
> (define-public (reify-all goal . names)
>   (let* ((vars (map var names))
>          (s/c (apply run (cons goal vars))))
>     (map (lambda (s) (map (cut reify <> s) vars)) s/c)))
>
> The above queries can be written using the database primitives like that:
>
> (define (query:posts-with-tag name)
>   (let* ((tag?? (uav-index-ref dbc 'tag/name name))
>          (taglink?? (uav-index-ref dbc 'tag/link tag??)))
>     (map (cut uav-ref dbc <> 'tag/post) taglink??)))
>
> There is small webapp that make use of this, that can be found over
> git [2]. It only requires guile and wiredtiger built from sources (develop
> branch, which is easy to build from source) [3].
>
>
> All the best!
>
>
> [0] https://github.com/wiredtiger/wiredtiger
> [1] https://github.com/jasonhemann/microKanren
> [2] https://git.framasoft.org/a-guile-mind/nanoblog.git
> [3] git clone https://github.com/wiredtiger/wiredtiger.git
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"minikanren" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/minikanren.
For more options, visit https://groups.google.com/d/optout.

Re: Querying a tuple database with minikanren

Reply via email to