Hi everyone,

I want to start a discussion, with the aim of an RFC, around implementing
Mango JSON indexes for FoundationDB. Currently Mango indexes are a layer
above CouchDB map/reduce indexes, but with FoundationDB we can make them
separate indexes in FoundationDB. This gives us the possibility of being
able to update the indexes in the same transaction that a document is being
saved in. Later we can look at adding specific mango like covering indexes.


Lets dive into the data model. Currently a user defines an index like this:


{

  name: ‘view-name’ - optional will be auto-generated

  index: {

    fields: [‘fieldA’, ‘fieldB’]

  },

  partial_filter_selector {} - optional

}


For query planning we need to be able to access the list of available
indexes. So we would have a index_definitions subspace with the following
content:


(<fieldname1>, …<rest of fields>) = (<index_name>,
<partial_filter_selector>)


Otherwise we could just store the index definitions as:

(index_name) = ((fields), partial_filter_selector).


At this stage, I can’t think of a fancy way of storing the index
definitions so that when we need to select an index for a query there would
be a fast way to only fetch a subset of the indexes. I think the best is to
rather fetch them all like we currently do and process them. However, we
can look at caching these index definitions in the application layer, and
using FoundationDB watches[0] to notify us when a definition has changed so
we can update the cached definitions.


Then each index definition will have its own dedicated subspace for the
actual built index key/values. Keys in this subspace would be the fields
defined in the index with the doc id at the end of the tuple, e.g for an
index with fields name and age, it would be:


(“john”, 40, “doc-id-1) = null

(“mary”, 30, “doc-id-2) = null


This follows the same key format that document layer[1] does for its
indexes. One point to make here is that the doc id is kept in the key part
so that we can avoid duplicate keys.


Then in terms of sorting the keys, current CouchDB uses ICU to sort all
secondary indexes. We would need to use ICU to sort the indexes for FDB but
we would have to do it differently. We will not be able to use ICU
collation operations directly, instead, we are going to have to look at
using ICU’s sort key[1] to generate a sort key ahead of time. At the same
time we need to look at creating binary encoding to capture the way that
CouchDB currently sorts object, array and numbers. This would most likely
be a sort of key prefix that we add to each key field along with the sort
key generated from ICU.


In terms of keeping mango indexes up to date, we should be able to update
all existing indexes in the same transaction as a document is
updated/created, this means we shouldn’t have to have any background
process keeping mango indexes updated. Though I imagine we going to have to
look at a background process that does update and build new indexes on an
existing index. We will have to do some decent performance testing around
this to determine the best solution, but looking at document layer they
seem to recommend updating the indexes in the transaction rather than in a
background process.


In the future, we could look at using the value space to store covering
indexed or materialized views. That way we would not need to always read
from the by_id when quering with Mango. Which would be a nice performance
improvement.



Please let me know any thoughts, improvements, suggestions or questions
around this.



[0] https://apple.github.io/foundationdb/features.html#watches

[1] https://github.com/FoundationDB/fdb-document-layer

[2] http://userguide.icu-project.org/collation/api#TOC-Sort-Key-Features

Reply via email to