Functional database

amirouche Fri, 09 Feb 2018 23:35:01 -0800

Héllo all,

# Introduction


I figured a usecase for an immutable / functional database that works
like git. I like the "streamable immutable database" name but not sure
it's applicable.

This prolly seems ambitious and pretentious, that said, I am certain Ican

get it done. The only uncertainty is performance, but I have also ideas
for that.

The idea of building git-like database is not new but now I have abetter

picture of it.

The question you want to ask, is why not re-implement git in guile andmaybe

use wiredtiger as backing store. Well, that is a legitimate question.
What I am trying to achieve is something more general than git.

Feel free to point me to relevant documentation or argue that git inguile is the

way forward.

The main use case I want to handle, is the ability to experiment withdifferentversions of a given machine learning model / data / dataset that mightbe bigger thanRAM. That is, easily and efficiently switch from one version of themodel to another

without resorting on copying all the files or database.

That is a version-ed branch-able fork-able database.

Feel free to argue that data and code are different and that data MUSTBE

distributed out-of-band, I will be reading with great interest.

# Description

It MUST have the following features:

- It support ACID transactions

- It's multi-threaded

- It's an association list database (like guile-wiredtiger'sfeature-space) wherekeys are symbols and values are any scheme value. Otherwise said,it's a document

 database.

- It support git like features ie. tags, branches, push, pull, revert,mergelog, diff and of course commits and revision. In particular, it'spossible

 to access the history of a given association.

- It's immutable in the sens that CRUD operation instead of changing
 values in place create new entries in the database to reflect the
 change. In terms of wiredtiger API, there is no call to cursor-update.
 It's only using cursor-insert calls.

- 'neon checkout REV' will bring in the working space a more efficientrepresentation

 of the data. That representation MUST BE configurable. Otherwise said,

if the user wants to version csv, a geo-temporal data, timeseries orwhatever it must

 be possible.

- It SHOULD allow to mix data with source files.

- It SHOULD also allow to store efficiently binaries.

# TODO

- code the "bare database" ie. the gist of the story that is theimmutable association

 list that takes inspiration from git.

- create benchmarks

- Index conceptnet and wikidata and demo the git-like features over thedictionary

 based named entity recognition.

Functional database

Reply via email to