garrensmith commented on code in PR #4410:
URL: https://github.com/apache/couchdb/pull/4410#discussion_r1089977986


##########
src/docs/rfcs/018-mango-covering-json-index.md:
##########
@@ -0,0 +1,254 @@
+---
+name: Formal RFC
+about: Submit a formal Request For Comments for consideration by the team.
+title: 'Support covering indexes when using Mango JSON (view) indexes'
+labels: rfc, discussion
+assignees: ''
+
+---
+
+[NOTE]: # ( ^^ Provide a general summary of the RFC in the title above. ^^ )
+
+# Introduction
+
+## Abstract
+
+[NOTE]: # ( Provide a 1-to-3 paragraph overview of the requested change. )
+[NOTE]: # ( Describe what problem you are solving, and the general approach. )
+
+Covering indexes are used to reduce the time the database takes to respond to
+queries. An index "covers" a query when the query only requires fields that are
+in the index (in this way, "covering" is a property of index and query
+combined). When this is the case, the database doesn't need to consult primary
+data and can return results for the query from only the index. In more familiar
+CouchDB terminology, this is equivalent to querying a view with
+`include_docs=false`.
+
+When evaluating a query, Mango currently doesn't use the concept of covering
+indexes; even if a query could be answered without reading each result's full
+JSON document, Mango will still read it. This makes it impossible for Mango to
+return data as quickly as the underlying view.
+
+My benchmarking shows that Mango can answer at the same rate as the underlying
+view index. It currently runs at the same pace as calling the view with
+`include_docs=true`. Preliminary modifications to Mango showed that, with
+covering index support and a query that can use it, Mango can stream results
+as quickly as the underlying view. Adding covering indexes therefore increases
+the production use-cases Mango can support substantially.
+
+There are likely two phases to this:
+
+- Enable covering indexing processing for current indexes (ie, over view keys).
+- Allow Mango view indexes to include extra data from documents, storing it in
+  the `value` of the view. Support use of this extra data within the covering
+  indexes feature.
+
+### Out of scope
+
+This proposal only covers adding covering indexes to JSON indexes and not text
+indexes. The aim is to reduce the need for CouchDB users to run separate
+processes, such as Lucene, to get improved querying performance and capability.
+
+We do not aim to replicate `reduce` functionality from views, only to bring
+parity to non-reduced view execution speed (ie, when views are used to search
+the document space) to Mango.
+
+## Requirements Language
+
+[NOTE]: # ( Do not alter the section below. Follow its instructions. )
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+"SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
+document are to be interpreted as described in
+[RFC 2119](https://www.rfc-editor.org/rfc/rfc2119.txt).
+
+## Terminology
+
+[TIP]:  # ( Provide a list of any unique terms or acronyms, and their 
definitions here.)
+
+- Mango: CouchDB's Mongo inspired querying system.
+- View / JSON index: Mango index that uses the same index as Cloudant views.
+- Coordinator: the erlang process that handles doing a distributed query across
+    a CouchDB cluster.
+
+---
+
+# Detailed Description
+
+[NOTE]: # ( Describe the solution being proposed in greater detail. )
+[NOTE]: # ( Assume your audience has knowledge of, but not necessarily 
familiarity )
+[NOTE]: # ( with, the CouchDB internals. Provide enough context so that the 
reader )
+[NOTE]: # ( can make an informed decision about the proposal. )
+
+[TIP]:  # ( Artwork may be attached to the submission and linked as necessary. 
)
+[TIP]:  # ( ASCII artwork can also be included in code blocks, if desired. )
+
+This would take place within `mango_view_cursor.erl`. The key functions
+involved are the shard-level `view_cb/2`, the streaming result handler at the
+coordinator end (`handle_message/2`) and the `execute/3` function.
+
+## Phase 1: handle keys only covering indexes
+
+Within `execute/3` we will need to decide whether the view should be requested
+to include documents. If the index is covering, this will not be required and
+so the `include_docs` argument to the view fabric call will be `false`. We'll
+need to add a helper method to return whether the index is covering.
+
+When selecting an index, we'll need to be careful of some subtleties. We will
+need to ensure that only fields in the `selector` and not `fields` are used 
when
+choosing an index. This is because we require all keys in the index to be 
fields
+within the selector -- with predicates implying `$exists=true` -- due to the
+fact that only documents that include _all_ fields in the index are added to 
the
+index. Therefore, if the selector doesn't imply all fields in the index's keys

Review Comment:
   Just checking my understanding. The added subtlety here is because 
previously we would do an in-memory filter of the document to check the filter 
completely matches the document. Now if we can use the index alone we have to 
make sure all fields in the selector are also in the index keys. So if a 
selector has filters on `name`, `age` and `country` and the `fields` section in 
the query is `name` and `age`. Mango would have to choose an index with `name`, 
`age` and `country` even though it is only returning two fields. Is that 
correct?
   
   
   
   What happens if no index satisfies this?



##########
src/docs/rfcs/018-mango-covering-json-index.md:
##########
@@ -0,0 +1,254 @@
+---
+name: Formal RFC
+about: Submit a formal Request For Comments for consideration by the team.
+title: 'Support covering indexes when using Mango JSON (view) indexes'
+labels: rfc, discussion
+assignees: ''
+
+---
+
+[NOTE]: # ( ^^ Provide a general summary of the RFC in the title above. ^^ )
+
+# Introduction
+
+## Abstract
+
+[NOTE]: # ( Provide a 1-to-3 paragraph overview of the requested change. )
+[NOTE]: # ( Describe what problem you are solving, and the general approach. )
+
+Covering indexes are used to reduce the time the database takes to respond to
+queries. An index "covers" a query when the query only requires fields that are
+in the index (in this way, "covering" is a property of index and query
+combined). When this is the case, the database doesn't need to consult primary
+data and can return results for the query from only the index. In more familiar
+CouchDB terminology, this is equivalent to querying a view with
+`include_docs=false`.
+
+When evaluating a query, Mango currently doesn't use the concept of covering
+indexes; even if a query could be answered without reading each result's full
+JSON document, Mango will still read it. This makes it impossible for Mango to
+return data as quickly as the underlying view.
+
+My benchmarking shows that Mango can answer at the same rate as the underlying
+view index. It currently runs at the same pace as calling the view with
+`include_docs=true`. Preliminary modifications to Mango showed that, with
+covering index support and a query that can use it, Mango can stream results
+as quickly as the underlying view. Adding covering indexes therefore increases
+the production use-cases Mango can support substantially.
+
+There are likely two phases to this:
+
+- Enable covering indexing processing for current indexes (ie, over view keys).
+- Allow Mango view indexes to include extra data from documents, storing it in
+  the `value` of the view. Support use of this extra data within the covering
+  indexes feature.
+
+### Out of scope
+
+This proposal only covers adding covering indexes to JSON indexes and not text
+indexes. The aim is to reduce the need for CouchDB users to run separate
+processes, such as Lucene, to get improved querying performance and capability.
+
+We do not aim to replicate `reduce` functionality from views, only to bring
+parity to non-reduced view execution speed (ie, when views are used to search
+the document space) to Mango.
+
+## Requirements Language
+
+[NOTE]: # ( Do not alter the section below. Follow its instructions. )
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+"SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
+document are to be interpreted as described in
+[RFC 2119](https://www.rfc-editor.org/rfc/rfc2119.txt).
+
+## Terminology
+
+[TIP]:  # ( Provide a list of any unique terms or acronyms, and their 
definitions here.)
+
+- Mango: CouchDB's Mongo inspired querying system.
+- View / JSON index: Mango index that uses the same index as Cloudant views.
+- Coordinator: the erlang process that handles doing a distributed query across
+    a CouchDB cluster.
+
+---
+
+# Detailed Description
+
+[NOTE]: # ( Describe the solution being proposed in greater detail. )
+[NOTE]: # ( Assume your audience has knowledge of, but not necessarily 
familiarity )
+[NOTE]: # ( with, the CouchDB internals. Provide enough context so that the 
reader )
+[NOTE]: # ( can make an informed decision about the proposal. )
+
+[TIP]:  # ( Artwork may be attached to the submission and linked as necessary. 
)
+[TIP]:  # ( ASCII artwork can also be included in code blocks, if desired. )
+
+This would take place within `mango_view_cursor.erl`. The key functions
+involved are the shard-level `view_cb/2`, the streaming result handler at the
+coordinator end (`handle_message/2`) and the `execute/3` function.
+
+## Phase 1: handle keys only covering indexes
+
+Within `execute/3` we will need to decide whether the view should be requested
+to include documents. If the index is covering, this will not be required and
+so the `include_docs` argument to the view fabric call will be `false`. We'll
+need to add a helper method to return whether the index is covering.
+
+When selecting an index, we'll need to be careful of some subtleties. We will
+need to ensure that only fields in the `selector` and not `fields` are used 
when
+choosing an index. This is because we require all keys in the index to be 
fields
+within the selector -- with predicates implying `$exists=true` -- due to the
+fact that only documents that include _all_ fields in the index are added to 
the
+index. Therefore, if the selector doesn't imply all fields in the index's keys
+exist, then using that index risks returning an incomplete result set.
+
+Within `view_cb/2`, we'll need to know whether an index is covering. Without
+that, `view_cb/2` will interpret the lack of included documents as an indicator
+that it should do nothing, while in fact we want it to fully process the result
+as it does when `include_docs` is used -- apart from when the user passes 
`r>=2` in the Mango query because then the coordinator reads and processes
+documents. (Aside: it'd be good to remove this `r` option to simplify things).

Review Comment:
   +1 to removing the `r` option. It has been something I wanted to remove for 
a long time. 



##########
src/docs/rfcs/018-mango-covering-json-index.md:
##########
@@ -0,0 +1,254 @@
+---
+name: Formal RFC
+about: Submit a formal Request For Comments for consideration by the team.
+title: 'Support covering indexes when using Mango JSON (view) indexes'
+labels: rfc, discussion
+assignees: ''
+
+---
+
+[NOTE]: # ( ^^ Provide a general summary of the RFC in the title above. ^^ )
+
+# Introduction
+
+## Abstract
+
+[NOTE]: # ( Provide a 1-to-3 paragraph overview of the requested change. )
+[NOTE]: # ( Describe what problem you are solving, and the general approach. )
+
+Covering indexes are used to reduce the time the database takes to respond to
+queries. An index "covers" a query when the query only requires fields that are
+in the index (in this way, "covering" is a property of index and query
+combined). When this is the case, the database doesn't need to consult primary
+data and can return results for the query from only the index. In more familiar
+CouchDB terminology, this is equivalent to querying a view with
+`include_docs=false`.
+
+When evaluating a query, Mango currently doesn't use the concept of covering
+indexes; even if a query could be answered without reading each result's full
+JSON document, Mango will still read it. This makes it impossible for Mango to
+return data as quickly as the underlying view.
+
+My benchmarking shows that Mango can answer at the same rate as the underlying
+view index. It currently runs at the same pace as calling the view with
+`include_docs=true`. Preliminary modifications to Mango showed that, with
+covering index support and a query that can use it, Mango can stream results
+as quickly as the underlying view. Adding covering indexes therefore increases
+the production use-cases Mango can support substantially.
+
+There are likely two phases to this:
+
+- Enable covering indexing processing for current indexes (ie, over view keys).
+- Allow Mango view indexes to include extra data from documents, storing it in
+  the `value` of the view. Support use of this extra data within the covering
+  indexes feature.
+
+### Out of scope
+
+This proposal only covers adding covering indexes to JSON indexes and not text
+indexes. The aim is to reduce the need for CouchDB users to run separate
+processes, such as Lucene, to get improved querying performance and capability.
+
+We do not aim to replicate `reduce` functionality from views, only to bring
+parity to non-reduced view execution speed (ie, when views are used to search
+the document space) to Mango.
+
+## Requirements Language
+
+[NOTE]: # ( Do not alter the section below. Follow its instructions. )
+
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+"SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
+document are to be interpreted as described in
+[RFC 2119](https://www.rfc-editor.org/rfc/rfc2119.txt).
+
+## Terminology
+
+[TIP]:  # ( Provide a list of any unique terms or acronyms, and their 
definitions here.)
+
+- Mango: CouchDB's Mongo inspired querying system.
+- View / JSON index: Mango index that uses the same index as Cloudant views.
+- Coordinator: the erlang process that handles doing a distributed query across
+    a CouchDB cluster.
+
+---
+
+# Detailed Description
+
+[NOTE]: # ( Describe the solution being proposed in greater detail. )
+[NOTE]: # ( Assume your audience has knowledge of, but not necessarily 
familiarity )
+[NOTE]: # ( with, the CouchDB internals. Provide enough context so that the 
reader )
+[NOTE]: # ( can make an informed decision about the proposal. )
+
+[TIP]:  # ( Artwork may be attached to the submission and linked as necessary. 
)
+[TIP]:  # ( ASCII artwork can also be included in code blocks, if desired. )
+
+This would take place within `mango_view_cursor.erl`. The key functions
+involved are the shard-level `view_cb/2`, the streaming result handler at the
+coordinator end (`handle_message/2`) and the `execute/3` function.
+
+## Phase 1: handle keys only covering indexes
+
+Within `execute/3` we will need to decide whether the view should be requested
+to include documents. If the index is covering, this will not be required and
+so the `include_docs` argument to the view fabric call will be `false`. We'll
+need to add a helper method to return whether the index is covering.
+
+When selecting an index, we'll need to be careful of some subtleties. We will
+need to ensure that only fields in the `selector` and not `fields` are used 
when
+choosing an index. This is because we require all keys in the index to be 
fields
+within the selector -- with predicates implying `$exists=true` -- due to the
+fact that only documents that include _all_ fields in the index are added to 
the
+index. Therefore, if the selector doesn't imply all fields in the index's keys
+exist, then using that index risks returning an incomplete result set.
+
+Within `view_cb/2`, we'll need to know whether an index is covering. Without
+that, `view_cb/2` will interpret the lack of included documents as an indicator
+that it should do nothing, while in fact we want it to fully process the result
+as it does when `include_docs` is used -- apart from when the user passes 
`r>=2` in the Mango query because then the coordinator reads and processes
+documents. (Aside: it'd be good to remove this `r` option to simplify things).
+
+In `handle_message/2` the main work is ensuring that we handle mixed cluster
+version states -- ie, cluster state during upgrades.
+
+## Phase 2: add support for included fields in indexes
+
+I propose we add an `include` field into a Mango JSON index definition:
+
+```json
+{
+    "index": {
+        "fields": [ "age", "name" ],
+        "include": [ "occupation", "manager_id" ]
+    },
+    "name": "foo-json-index",
+    "type": "json"
+}
+```
+
+Behaviour requirements:
+
+- Unlike `fields`, the fields in `include` _do not have to exist_ in the source
+    document in order that the document be included in the index. This is to
+    allow the index to cover more queries.
+- Including a deeply nested field would follow the same pattern as for other
+    field references in mango, `person.address.zip`.
+- There is no notation to include the whole document, that is, no equivalent of

Review Comment:
   Is this something we should consider or rather if a user wants the whole 
document, they would need to list all the fields of the index?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to