[GitHub] [couchdb-documentation] wohali commented on a change in pull request #418: Add search index documentation

GitBox Wed, 12 Jun 2019 03:43:34 -0700

wohali commented on a change in pull request #418: Add search index 
documentation
URL: 
https://github.com/apache/couchdb-documentation/pull/418#discussion_r292847402


 ##########
 File path: src/api/ddoc/views.rst
 ##########
 @@ -315,6 +315,1144 @@ including the update sequence of the database from 
which the view was
 generated. The returned value can be compared this to the current update
 sequence exposed in the database information (returned by :get:`/{db}`).
 
+Search
+======
+
+Search indexes enable you to query a database by using `Lucene Query Parser 
Syntax 
<http://lucene.apache.org/core/4_3_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#Overview>`_.
 A search index uses one, or multiple, fields from your documents. You can use 
a search index to run queries, find documents based on the content they 
contain, or work with groups, facets, or geographical searches.
+
+To create a search index, you add a JavaScript function to a design document 
in the database. An index builds after processing one search request or after 
the server detects a document update. The ``index`` function takes the 
following parameters: 
+
+1.  Field name - The name of the field you want to use when you query the 
index. If you set this parameter to ``default``, then this field is queried if 
no field is specified in the query syntax.
+2.  Data that you want to index, for example, ``doc.address.country``. 
+3.  (Optional) The third parameter includes the following fields: ``boost``, 
``facet``, ``index``, and ``store``. These fields are described in more detail 
later.   
+
+By default, a search index response returns 25 rows. The number of rows that 
is returned can be changed by using the ``limit`` parameter. However, a result 
set from a search is limited to 200 rows. Each response includes a ``bookmark`` 
field. You can include the value of the ``bookmark`` field in later queries to 
look through the responses.
+
+*Example design document that defines a search index:*
+
+.. code-block:: javascript
+
+    {
+       "_id": "_design/search_example",
+       "indexes": {
+               "animals": {
+                       "index": "function(doc){ ... }"
+               }
+           }
+    }
+
+Search index partitioning type
+------------------------------
+
+A search index will inherit the partitioning type from the 
``options.partitioned``
+field of the design document that contains it.
+
+Index functions
+---------------
+
+Attempting to index by using a data field that does not exist fails. To avoid 
this problem, use an appropriate :ref:`index_guard_clauses <api/ddoc/view>`.
+
+.. note:: 
+    Your indexing functions operate in a memory-constrained environment where 
the 
+    document itself forms a part of the memory that is used in that 
environment. 
+    Your code's stack and document must fit inside this memory. In other 
words, a document 
+    must be loaded in order to be indexed. Documents are limited to a maximum 
size of 64 MB.
+
+.. note:: 
+    Within a search index, do not index the same field name with more than one 
data 
+    type. If the same field name is indexed with different data types in the 
same search 
+    index function, you might get an error when querying the search index that 
says the 
+    field "was indexed without position data." For example, do not include 
both of these 
+    lines in the same search index function, as they index the ``myfield`` 
field as two 
+    different data types: a string ``"this is a string"`` and a number ``123``.
+
+.. code-block:: javascript
+
+    index("myfield", "this is a string");
+    index("myfield", 123);
+
+The function that is contained in the index field is a JavaScript function
+that is called for each document in the database.
+The function takes the document as a parameter,
+extracts some data from it,
+and then calls the function that is defined in the ``index`` field to index 
that data.
+
+The ``index`` function takes three parameters, where the third parameter is 
optional.
+
+The first parameter is the name of the field you intend to use when querying 
the index,
+and which is specified in the Lucene syntax portion of subsequent queries.
+An example appears in the following query:
+
+.. code-block:: javascript
+
+    query=color:red
+
+The Lucene field name ``color`` is the first parameter of the ``index`` 
function.
+
+The ``query`` parameter can be abbreviated to ``q``,
+so another way of writing the query is as follows:
+
+.. code-block:: javascript
+
+    q=color:red
+
+If the special value ``"default"`` is used when you define the name,
+you do not have to specify a field name at query time.
+The effect is that the query can be simplified:
+
+.. code-block:: javascript
+
+    query=red
+
+The second parameter is the data to be indexed. Keep the following information 
in mind when you index your data: 
+
+- This data must be only a string, number, or boolean. Other types will cause 
an error to be thrown by the index function call.
+- If an error is thrown when running your function, for this reason or others, 
the document will not be added to that search index.
+
+The third, optional, parameter is a JavaScript object with the following 
fields:
+
+*Index function (optional parameter)*
+
++--------------+----------------------------------------------------------------------+----------------------------------+-----------------+
+| Option       | Description                                                   
       | Values                           | Default         |
++==============+======================================================================+==================================+=================+
+| ``boost``    | A number that specifies the relevance in search results.      
       | A positive floating point number | 1 (no boosting) |
+|              | Content that is indexed with a boost value greater than 1     
       |                                  |                 |
+|              | is more relevant than content that is indexed without a boost 
value. |                                  |                 |
+|              | Content with a boost value less than one is not so relevant.  
       |                                  |                 |
++--------------+----------------------------------------------------------------------+----------------------------------+-----------------+
+| ``facet``    | Creates a faceted index. For more information, see            
       | ``true``, ``false``              | ``false``       |
+|              | :ref:`faceting <api/ddoc/view>`.                              
       |                                  |                 |
++--------------+----------------------------------------------------------------------+----------------------------------+-----------------+
+| ``index``    | Whether the data is indexed, and if so, how. If set to 
``false``,    | ``true``, ``false``              | ``false``       |
+|              | the data cannot be used for searches, but can still be 
retrieved     |                                  |                 |
+|              | from the index if ``store`` is set to ``true``.               
       |                                  |                 |
+|              | For more information, see :ref:`analyzers <api/ddoc/view>`.   
       |                                  |                 |
++--------------+----------------------------------------------------------------------+----------------------------------+-----------------+
+| ``store``    | If ``true``, the value is returned in the search result;      
       | ``true``, ``false``              | ``false``       |
+|              | otherwise, the value is not returned.                         
       |                                  |                 |
++--------------+----------------------------------------------------------------------+----------------------------------+-----------------+
+
+.. note:: 
+
+    If you do not set the ``store`` parameter,
+    the index data results for the document are not returned in response to a 
query.
+
+*Example search index function:*
+
+.. code-block:: javascript
+
+    function(doc) {
+           index("default", doc._id);
+           if (doc.min_length) {
+                   index("min_length", doc.min_length, {"store": true});
+           }
+           if (doc.diet) {
+                   index("diet", doc.diet, {"store": true});
+           }
+           if (doc.latin_name) {
+                   index("latin_name", doc.latin_name, {"store": true});
+           }
+           if (doc.class) {
+                   index("class", doc.class, {"store": true});
+           }
+    }
+
+.. _api/ddoc/view/index_guard_clauses:
+
+Index guard clauses
+^^^^^^^^^^^^^^^^^^^
+
+The ``index`` function requires the name of the data field to index as the 
second parameter.
+However,
+if that data field does not exist for the document,
+an error occurs.
+The solution is to use an appropriate 'guard clause' that checks if the field 
exists,
+and contains the expected type of data,
+*before* any attempt to create the corresponding index.
+
+*Example of failing to check whether the index data field exists:*
+
+.. code-block:: javascript
+
+    if (doc.min_length) {
+           index("min_length", doc.min_length, {"store": true});
+    }
+
+You might use the JavaScript ``typeof`` function to implement the guard clause 
test.
+If the field exists *and* has the expected type,
+the correct type name is returned,
+so the guard clause test succeeds and it is safe to use the index function.
+If the field does *not* exist,
+you would not get back the expected type of the field,
+therefore you would not attempt to index the field.
+
+JavaScript considers a result to be false if one of the following values is 
tested:
+
+*      'undefined'
+*      null
+*      The number +0
+*      The number -0
+*      NaN (not a number)
+*      "" (the empty string)
+
+*Using a guard clause to check whether the required data field exists,
+and holds a number,
+before an attempt to index:*
+
+.. code-block:: javascript
+
+    if (typeof(doc.min_length) === 'number') {
+           index("min_length", doc.min_length, {"store": true});
+    }
+
+Use a generic guard clause test to ensure that the type of the candidate data 
field is defined.
+
+*Example of a 'generic' guard clause:*
+
+.. code-block:: javascript
+
+    if (typeof(doc.min_length) !== 'undefined') {
+           // The field exists, and does have a type, so we can proceed to 
index using it.
+           ...
+    }
+
+.. _api/ddoc/view/analyzers:
+
+Analyzers
+---------
+
+Analyzers are settings that define how to recognize terms within text.
+Analyzers can be helpful if you need to :ref:`language-specific-analyzers 
<api/ddoc/view>`.
+
+Here's the list of generic analyzers that are supported by search:
+
++----------------+---------------------------------------------------------------------------------+
+| Analyzer       | Description                                                 
                    |
++================+=================================================================================+
+| ``classic``    | The standard Lucene analyzer, circa release 3.1.            
                    |
++----------------+---------------------------------------------------------------------------------+
+| ``email``      | Like the ``standard`` analyzer, but tries harder to match 
an email              |
+|                | address as a complete token.                                
                    |
++----------------+---------------------------------------------------------------------------------+
+| ``keyword``    | Input is not tokenized at all.                              
                    |
++----------------+---------------------------------------------------------------------------------+
+| ``simple``     | Divides text at non-letters.                                
                    |
++----------------+---------------------------------------------------------------------------------+
+| ``standard``   | The default analyzer. It implements the Word Break rules 
from the               |
+|                | `Unicode Text Segmentation algorithm 
<http://www.unicode.org/reports/tr29/>`_.  |
++----------------+---------------------------------------------------------------------------------+
+| ``whitespace`` | Divides text at white space boundaries.                     
                    |
++----------------+---------------------------------------------------------------------------------+
+
+
+*Example analyzer document:*
+
+.. code-block:: javascript
+
+    {
+           "_id": "_design/analyzer_example",
+           "indexes": {
+                   "INDEX_NAME": {
+                           "index": "function (doc) { ... }",
+                           "analyzer": "$ANALYZER_NAME"
+                   }
+           }
+    }
+
+.. _api/ddoc/view/language-specific-analyzers:
+
+Language-specific analyzers
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 Review comment:
   These all come from (a specific version of) Lucene, yes? Perhaps we should 
include a link back to Lucene's documentation where the full list is also 
specified.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] [couchdb-documentation] wohali commented on a change in pull request #418: Add search index documentation

Reply via email to