Héllo all,

# Introduction

I figured a usecase for an immutable / functional database that works
like git. I like the "streamable immutable database" name but not sure
it's applicable.

This prolly seems ambitious and pretentious, that said, I am certain I can
get it done. The only uncertainty is performance, but I have also ideas
for that.

The idea of building git-like database is not new but now I have a better
picture of it.

The question you want to ask, is why not re-implement git in guile and maybe
use wiredtiger as backing store. Well, that is a legitimate question.
What I am trying to achieve is something more general than git.

Feel free to point me to relevant documentation or argue that git in guile is the
way forward.

The main use case I want to handle, is the ability to experiment with different versions of a given machine learning model / data / dataset that might be bigger than RAM. That is, easily and efficiently switch from one version of the model to another
without resorting on copying all the files or database.

That is a version-ed branch-able fork-able database.

Feel free to argue that data and code are different and that data MUST BE
distributed out-of-band, I will be reading with great interest.

# Description

It MUST have the following features:

- It support ACID transactions

- It's multi-threaded

- It's an association list database (like guile-wiredtiger's feature-space) where keys are symbols and values are any scheme value. Otherwise said, it's a document

- It support git like features ie. tags, branches, push, pull, revert, merge log, diff and of course commits and revision. In particular, it's possible
 to access the history of a given association.

- It's immutable in the sens that CRUD operation instead of changing
 values in place create new entries in the database to reflect the
 change. In terms of wiredtiger API, there is no call to cursor-update.
 It's only using cursor-insert calls.

- 'neon checkout REV' will bring in the working space a more efficient representation
 of the data. That representation MUST BE configurable. Otherwise said,
if the user wants to version csv, a geo-temporal data, timeseries or whatever it must
 be possible.

- It SHOULD allow to mix data with source files.

- It SHOULD also allow to store efficiently binaries.


- code the "bare database" ie. the gist of the story that is the immutable association
 list that takes inspiration from git.

- create benchmarks

- Index conceptnet and wikidata and demo the git-like features over the dictionary
 based named entity recognition.

Reply via email to