Hi everyone,
I want to start a discussion, with the aim of an RFC, around implementing Mango JSON indexes for FoundationDB. Currently Mango indexes are a layer above CouchDB map/reduce indexes, but with FoundationDB we can make them separate indexes in FoundationDB. This gives us the possibility of being able to update the indexes in the same transaction that a document is being saved in. Later we can look at adding specific mango like covering indexes. Lets dive into the data model. Currently a user defines an index like this: { name: ‘view-name’ - optional will be auto-generated index: { fields: [‘fieldA’, ‘fieldB’] }, partial_filter_selector {} - optional } For query planning we need to be able to access the list of available indexes. So we would have a index_definitions subspace with the following content: (<fieldname1>, …<rest of fields>) = (<index_name>, <partial_filter_selector>) Otherwise we could just store the index definitions as: (index_name) = ((fields), partial_filter_selector). At this stage, I can’t think of a fancy way of storing the index definitions so that when we need to select an index for a query there would be a fast way to only fetch a subset of the indexes. I think the best is to rather fetch them all like we currently do and process them. However, we can look at caching these index definitions in the application layer, and using FoundationDB watches[0] to notify us when a definition has changed so we can update the cached definitions. Then each index definition will have its own dedicated subspace for the actual built index key/values. Keys in this subspace would be the fields defined in the index with the doc id at the end of the tuple, e.g for an index with fields name and age, it would be: (“john”, 40, “doc-id-1) = null (“mary”, 30, “doc-id-2) = null This follows the same key format that document layer[1] does for its indexes. One point to make here is that the doc id is kept in the key part so that we can avoid duplicate keys. Then in terms of sorting the keys, current CouchDB uses ICU to sort all secondary indexes. We would need to use ICU to sort the indexes for FDB but we would have to do it differently. We will not be able to use ICU collation operations directly, instead, we are going to have to look at using ICU’s sort key[1] to generate a sort key ahead of time. At the same time we need to look at creating binary encoding to capture the way that CouchDB currently sorts object, array and numbers. This would most likely be a sort of key prefix that we add to each key field along with the sort key generated from ICU. In terms of keeping mango indexes up to date, we should be able to update all existing indexes in the same transaction as a document is updated/created, this means we shouldn’t have to have any background process keeping mango indexes updated. Though I imagine we going to have to look at a background process that does update and build new indexes on an existing index. We will have to do some decent performance testing around this to determine the best solution, but looking at document layer they seem to recommend updating the indexes in the transaction rather than in a background process. In the future, we could look at using the value space to store covering indexed or materialized views. That way we would not need to always read from the by_id when quering with Mango. Which would be a nice performance improvement. Please let me know any thoughts, improvements, suggestions or questions around this. [0] https://apple.github.io/foundationdb/features.html#watches [1] https://github.com/FoundationDB/fdb-document-layer [2] http://userguide.icu-project.org/collation/api#TOC-Sort-Key-Features