espy opened a new issue, #5483:
URL: https://github.com/apache/couchdb/issues/5483

   Hello,
   
   I’ve been trying out Mango and Nouveau together, orienting myself along 
what’s in the docs and the tests, and have run into several problems. I’ve 
provided a full set of cURL commands to reproduce the issues, namely:
   
   - Queries against `string` index fields only occasionally work
   - `$text` queries work the same as regular queries against `string` index 
fields: they’re keyword searches, not Lucene full-text searches
   - `"type": "text"` index fields never seem to work at all
   
   This didn’t become obvious in testing because the tests seem to:
   - Use the one of the two `string` index fields that actually works by sheer 
coincidence
   - Use a `$text` query that also worked by coincidence, not because the 
underlying mechanism works properly
   
   I’ve basically done something similar to the Mango-related tests in [this 
file](https://github.com/apache/couchdb/blob/main/test/elixir/test/nouveau_test.exs)
 and linked the relevant lines from the test file below.
   
   Using CouchDB `3.4.2` with `nouveau-1.0-SNAPSHOT.jar`.
   
   ## Setup
   
   Add this line to all requests in case you need auth:
   
   `--user COUCHDB_USERNAME:COUCHDB_PASSWORD \`
   
   First, make a fresh DB:
   
   ```sh
   curl  -X PUT \
     'http://127.0.0.1:5984/mouveau' \
     --header 'Accept: */*' \
   ```
   
   Insert some docs:
   
   ```sh
   curl  -X POST \
     'http://127.0.0.1:5984/mouveau/_bulk_docs' \
     --header 'Accept: */*' \
     --header 'Content-Type: application/json' \
     --data-raw '
   {
       "docs": [
           {
               "_id": "FishStew",
               "servings": 4,
               "subtitle": "Delicious with freshly baked bread",
               "title": "Fish Stew",
               "type": "fish"
           },
           {
               "_id": "LambStew",
               "servings": 6,
               "subtitle": "Serve with a whole meal scone topping",
               "title": "Lamb Stew",
               "type": "meat"
           },
           {
               "_id": "Dumplings",
               "servings": 8,
               "subtitle": "Hand-made dumplings make a great accompaniment",
               "title": "Dumplings",
               "type": "meat"
           }
       ]
   }'
   ```
   
   Now make a Nouveau index. This is basically the [same index as used in the 
tests](https://github.com/apache/couchdb/blob/main/test/elixir/test/nouveau_test.exs#L99-L108)
 but adapted to our documents' field names:
   
   ```sh
   curl  -X POST \
     'http://127.0.0.1:5984/mouveau/_index' \
     --header 'Accept: */*' \
     --header 'Content-Type: application/json' \
     --data-raw '{
       "type": "nouveau",
       "index": {
           "fields": [
               {"name": "title", "type": "string"},
               {"name": "servings", "type": "number"},
               {"name": "type", "type": "string"}
           ],
           "default_analyzer": "keyword"
       }
   }'
   ```
   
   Let’s do some queries. First, querying for strings, analogous to the `Mango 
search by string` 
[test](https://github.com/apache/couchdb/blob/main/test/elixir/test/nouveau_test.exs#L434):
   
   ```sh
   curl  -X POST \
     'http://127.0.0.1:5984/mouveau/_find' \
     --header 'Accept: */*' \
     --header 'Content-Type: application/json' \
     --data-raw '{
       "selector": {
         "type": "meat"
       }
   }'
   ```
   That works.
   
   (Only showing `selector` lines for brevity from now on, the cURL command is 
always the same)
   
   ```json
   "selector": {
     "title": "Lamb Stew"
   }
   ```
   This doesn’t work at all:
   
   ```json
   {
     "error": "nouveau_search_error",
     "reason": "bad_request: field \"title_3astring\" was indexed without 
position data; cannot run PhraseQuery (phrase=title_3astring:\"lamb stew\")",
     "ref": 2009526893
   }
   ```
   Hm, maybe if we just search for a single word?
   
   ```json
   "selector": {
     "title": "Dumplings"
   }
   ```
   Different nope, but also nope:
   
   ```json
   {
     "docs": [],
     "bookmark": "W10="
   }
   ```
   
   What about numbers? This is analogous to the `Mango search by number` 
[test](https://github.com/apache/couchdb/blob/main/test/elixir/test/nouveau_test.exs#L408):
   
   ```json
   "selector": {
     "servings": {"$gte": 6}
   }
   ```
   
   That works. 
   
   The last test is a `$text` query, same as the `mango search by text` 
[test](https://github.com/apache/couchdb/blob/main/test/elixir/test/nouveau_test.exs#L434).
 Now, interestingly, the [doc 
example](https://docs.couchdb.org/en/latest/ddocs/mango.html#the-text-operator) 
for this query type selector looks different than the one in the test:
   
   ```js
   // test: just a keyword
   {"$text": "hello"}
   
   // docs: this is Lucene full-text search syntax
   {
     "_id": { "$gt": null },
     "$text": "director:George"
   }
   ```
   
   Also, remember: the test runs this `$text` query against an index that does 
_not_ have any `"type":"text"` fields! And surprisigly, this works!
   
   ```json
   "selector": {
     "$text": "lamb"
   }
   ```
   
   But this seems to be only a keyword search, and not a fully-fledged `text` 
search, which becomes apparent because these all fail:
   
   ```json
   "selector": {
     "$text": "lamb s"
   }
   ```
   
   ```json
   "selector": {
     "$text": "title:lamb"
   }
   ```
   
   ```json
   "selector": {
     "$text": "title:lam*"
   }
   ```
   
   ```json
   "selector": {
     "$text": "lam*"
   }
   ```
   And all permutations thereof.
   
   ## Summary:
   
   - `number` queries, as in the tests, work. Yay.
   - `string` queries, as in the tests, _sometimes_ work. Couldn’t find a 
pattern yet.
   - `$text` queries against the index as used in the tests do not actually 
perform Lucene queries, but the same keyword-style searches as the `string` 
queries, but against _all_ (`string`?) index fields.
   
   ## Workaround attempts:
   
   - Change the order of the `string` queries to see what happens: no change, 
querying for `title` still fails, `type` still works.
   - Try different values for `default_analyzer` and `analyzer.default`, both 
`standard` and `english` do not change the results for `$text` queries.
     
   ## What about the `text` _index_ type?
   
   Now, the test mango index and the [example index in the 
docs](https://docs.couchdb.org/en/latest/ddocs/mango.html#text-indexes) are the 
same, and interestingly, the docs refer to this as a `Text index`. However, 
this section also describes a `text` index _field_ type, just like in 
[Nouveau](https://docs.couchdb.org/en/latest/ddocs/nouveau.html#field-types), 
to go alongside `string` and `number` and `boolean` index field types. This 
isn’t documented or tested anywhere, but I’d assume this to work like so 
(delete the old ddoc first):
   
   ```sh
   curl  -X POST \
     'http://127.0.0.1:5984/mouveau/_index' \
     --header 'Accept: */*' \
     --header 'Content-Type: application/json' \
     --data-raw '{
       "type": "nouveau",
       "index": {
           "fields": [
               {"name": "title", "type": "text"}
           ],
           "default_analyzer": "english"
       }
   }'
   ```
   
   But no query against this returns anything, no permutation of `$text` works, 
and using something like `"title": "Dumplings"` falls back to not using any 
index.
   
   ## Expected results:
   
   - Queries against `string` index fields always work.
   - `$text` queries use Lucene syntax? Unsure what the intention is here, the 
docs do one thing, the tests another.
   - `text` type indexes are queryable with a selector like `"$text": 
"fieldname:querystring"`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@couchdb.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to