[GitHub] [couchdb-documentation] kocolosk commented on a change in pull request #418: Add search index documentation

GitBox Wed, 26 Jun 2019 18:32:19 -0700

kocolosk commented on a change in pull request #418: Add search index 
documentation
URL: 
https://github.com/apache/couchdb-documentation/pull/418#discussion_r297925062


 ##########
 File path: src/api/ddoc/views.rst
 ##########
 @@ -315,22 +313,1355 @@ including the update sequence of the database from 
which the view was
 generated. The returned value can be compared this to the current update
 sequence exposed in the database information (returned by :get:`/{db}`).
 
+Search
+======
+
+Search indexes enable you to query a database by using
+`Lucene Query Parser Syntax 
<http://lucene.apache.org/core/4_3_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#Overview>`_
+A search index uses one, or multiple, fields from your documents.
+You can use a search index to run queries, find documents based on
+the content they contain, or work with groups, facets, or
+geographical searches.
+
+.. warning::
+    Search cannot function unless it has a functioning, cluster-connected
+    Clouseau instance.
+
+To create a search index, you add a JavaScript function to a design document
+in the database. An index builds after processing one search request or after
+the server detects a document update. The ``index`` function takes the
+following parameters:
+
+1.  Field name - The name of the field you want to use when you query the 
index.
+If you set this parameter to ``default``, then this field is queried if no 
field
+is specified in the query syntax.
+
+2.  Data that you want to index, for example, ``doc.address.country``.
+
+3.  (Optional) The third parameter includes the following fields: ``boost``, 
``facet``,
+``index``, and ``store``. These fields are described in more detail later.
+
+By default, a search index response returns 25 rows. The number of rows that is
+returned can be changed by using the ``limit`` parameter. However, a result set
+from a search is limited to 200 rows. Each response includes a ``bookmark`` 
field.
+You can include the value of the ``bookmark`` field in later queries to look
+through the responses.
+
+*Example design document that defines a search index:*
+
+.. code-block:: javascript
+
+    {
+        "_id": "_design/search_example",
+        "indexes": {
+            "animals": {
+                "index": "function(doc){ ... }"
+            }
+        }
+    }
+
+Search index partitioning type
+------------------------------
+
+A search index will inherit the partitioning type from the
+``options.partitioned`` field of the design document that contains it.
+
+Index functions
+---------------
+
+Attempting to index by using a data field that does not exist fails. To avoid
+this problem, use the appropriate :ref:`guard clause 
<api/ddoc/view/index_guard_clauses>`.
+
+.. note::
+    Your indexing functions operate in a memory-constrained environment
+    where the document itself forms a part of the memory that is used
+    in that environment. Your code's stack and document must fit inside this
+    memory. In other words, a document must be loaded in order to be indexed.
+    Documents are limited to a maximum size of 64 MB.
+
+.. note::
+    Within a search index, do not index the same field name with more than one 
data
+    type. If the same field name is indexed with different data types in the 
same search
+    index function, you might get an error when querying the search index that 
says the
+    field "was indexed without position data." For example, do not include 
both of these
+    lines in the same search index function, as they index the ``myfield`` 
field as two
+    different data types: a string ``"this is a string"`` and a number ``123``.
+
+.. code-block:: javascript
+
+    index("myfield", "this is a string");
+    index("myfield", 123);
+
+The function that is contained in the index field is a JavaScript function
+that is called for each document in the database.
+The function takes the document as a parameter,
+extracts some data from it, and then calls the function that is defined
+in the ``index`` field to index that data.
+
+The ``index`` function takes three parameters, where the third parameter is 
optional.
+
+The first parameter is the name of the field you intend to use when querying 
the index,
+and which is specified in the Lucene syntax portion of subsequent queries.
+An example appears in the following query:
+
+.. code-block:: javascript
+
+    query=color:red
+
+The Lucene field name ``color`` is the first parameter of the ``index`` 
function.
+
+The ``query`` parameter can be abbreviated to ``q``,
+so another way of writing the query is as follows:
+
+.. code-block:: javascript
+
+    q=color:red
+
+If the special value ``"default"`` is used when you define the name,
+you do not have to specify a field name at query time.
+The effect is that the query can be simplified:
+
+.. code-block:: javascript
+
+    query=red
+
+The second parameter is the data to be indexed. Keep the following information
+in mind when you index your data:
+
+- This data must be only a string, number, or boolean. Other types will cause
+  an error to be thrown by the index function call.
+
+- If an error is thrown when running your function, for this reason or others,
+  the document will not be added to that search index.
+
+The third, optional, parameter is a JavaScript object with the following 
fields:
+
+*Index function (optional parameter)*
+
+* **boost** - A number that specifies the relevance in search results. Content 
that is
+  indexed with a boost value greater than 1 is more relevant than content that 
is
+  indexed without a boost value. Content with a boost value less than one is 
not so
+  relevant. Value is a positive floating point number. Default is 1 (no 
boosting).
+
+* **facet** - Creates a faceted index. See :ref:`Faceting 
<api/ddoc/view/faceting>`.
+  Values are ``true`` or ``false``. Default is ``false``.
+
+* **index** - Whether the data is indexed, and if so, how. If set to 
``false``, the data
+  cannot be used for searches, but can still be retrieved from the index if 
``store`` is
+  set to ``true``. See :ref:`Analyzers <api/ddoc/view/analyzers>`.
+  Values are ``true`` or ``false``. Default is ``true``
+
+* **store** - If ``true``, the value is returned in the search result; 
otherwise,
+  the value is not returned. Values are ``true`` or ``false``. Default is 
``false``.
+
+.. note::
+
+    If you do not set the ``store`` parameter,
+    the index data results for the document are not returned in response to a 
query.
+
+*Example search index function:*
+
+.. code-block:: javascript
+
+    function(doc) {
+        index("default", doc._id);
+        if (doc.min_length) {
+            index("min_length", doc.min_length, {"store": true});
+        }
+        if (doc.diet) {
+            index("diet", doc.diet, {"store": true});
+        }
+        if (doc.latin_name) {
+            index("latin_name", doc.latin_name, {"store": true});
+        }
+        if (doc.class) {
+            index("class", doc.class, {"store": true});
+        }
+    }
+
+.. _api/ddoc/view/index_guard_clauses:
+
+Index guard clauses
+^^^^^^^^^^^^^^^^^^^
+
+The ``index`` function requires the name of the data field to index as
+the second parameter. However,
+if that data field does not exist for the document,
+an error occurs. The solution is to use an appropriate
+'guard clause' that checks if the field exists,
+and contains the expected type of data,
+*before* any attempt to create the corresponding index.
+
+*Example of failing to check whether the index data field exists:*
+
+.. code-block:: javascript
+
+    if (doc.min_length) {
+        index("min_length", doc.min_length, {"store": true});
+    }
+
+You might use the JavaScript ``typeof`` function to implement the guard clause 
test.
+If the field exists *and* has the expected type,
+the correct type name is returned,
+so the guard clause test succeeds and it is safe to use the index function.
+If the field does *not* exist,
+you would not get back the expected type of the field,
+therefore you would not attempt to index the field.
+
+JavaScript considers a result to be false if one of the following values is 
tested:
+
+* 'undefined'
+* null
+* The number +0
+* The number -0
+* NaN (not a number)
+* "" (the empty string)
+
+*Using a guard clause to check whether the required data field exists,
+and holds a number, before an attempt to index:*
+
+.. code-block:: javascript
+
+    if (typeof(doc.min_length) === 'number') {
+        index("min_length", doc.min_length, {"store": true});
+    }
+
+Use a generic guard clause test to ensure that the type of the candidate data
+field is defined.
+
+*Example of a 'generic' guard clause:*
+
+.. code-block:: javascript
+
+    if (typeof(doc.min_length) !== 'undefined') {
+        // The field exists, and does have a type, so we can proceed to index 
using it.
+        ...
+    }
+
+.. _api/ddoc/view/analyzers:
+
+Analyzers
+---------
+
+Analyzers are settings that define how to recognize terms within text.
+Analyzers can be helpful if you need to
+:ref:`index multiple languages <api/ddoc/view/language-specific-analyzers>`.
+
+Here's the list of generic analyzers, and their descriptions, that are
+supported by search:
+
+- ``classic`` - The standard Lucene analyzer, circa release 3.1.
+- ``email`` - Like the ``standard`` analyzer, but tries harder to
+  match an email address as a complete token.
+- ``keyword`` - Input is not tokenized at all.
+- ``simple`` - Divides text at non-letters.
+- ``standard`` - The default analyzer. It implements the Word Break
+  rules from the `Unicode Text Segmentation algorithm 
<http://www.unicode.org/reports/tr29/>`_
+- ``whitespace`` - Divides text at white space boundaries.
+
+*Example analyzer document:*
+
+.. code-block:: javascript
+
+    {
+        "_id": "_design/analyzer_example",
+        "indexes": {
+            "INDEX_NAME": {
+                "index": "function (doc) { ... }",
+                "analyzer": "$ANALYZER_NAME"
+            }
+        }
+    }
+
+.. _api/ddoc/view/language-specific-analyzers:
+
+Language-specific analyzers
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+These analyzers omit common words in the specific language,
+and many also `remove prefixes and suffixes 
<http://en.wikipedia.org/wiki/Stemming>`_.
+The name of the language is also the name of the analyzer.
+
++----------------+----------------------------------------------------------+
+| Language       | Analyzer                                                 |
++================+==========================================================+
+| ``arabic``     | org.apache.lucene.analysis.ar.ArabicAnalyzer             |
++----------------+----------------------------------------------------------+
+| ``armenian``   | org.apache.lucene.analysis.hy.ArmenianAnalyzer           |
++----------------+----------------------------------------------------------+
+| ``basque``     | org.apache.lucene.analysis.eu.BasqueAnalyzer             |
++----------------+----------------------------------------------------------+
+| ``bulgarian``  | org.apache.lucene.analysis.bg.BulgarianAnalyzer          |
++----------------+----------------------------------------------------------+
+| ``brazilian``  | org.apache.lucene.analysis.br.BrazilianAnalyzer          |
++----------------+----------------------------------------------------------+
+| ``catalan``    | org.apache.lucene.analysis.ca.CatalanAnalyzer            |
++----------------+----------------------------------------------------------+
+| ``cjk``        | org.apache.lucene.analysis.cjk.CJKAnalyzer               |
++----------------+----------------------------------------------------------+
+| ``chinese``    | org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer |
++----------------+----------------------------------------------------------+
+| ``czech``      | org.apache.lucene.analysis.cz.CzechAnalyzer              |
++----------------+----------------------------------------------------------+
+| ``danish``     | org.apache.lucene.analysis.da.DanishAnalyzer             |
++----------------+----------------------------------------------------------+
+| ``dutch``      | org.apache.lucene.analysis.nl.DutchAnalyzer              |
++----------------+----------------------------------------------------------+
+| ``english``    | org.apache.lucene.analysis.en.EnglishAnalyzer            |
++----------------+----------------------------------------------------------+
+| ``finnish``    | org.apache.lucene.analysis.fi.FinnishAnalyzer            |
++----------------+----------------------------------------------------------+
+| ``french``     | org.apache.lucene.analysis.fr.FrenchAnalyzer             |
++----------------+----------------------------------------------------------+
+| ``german``     | org.apache.lucene.analysis.de.GermanAnalyzer             |
++----------------+----------------------------------------------------------+
+| ``greek``      | org.apache.lucene.analysis.el.GreekAnalyzer              |
++----------------+----------------------------------------------------------+
+| ``galician``   | org.apache.lucene.analysis.gl.GalicianAnalyzer           |
++----------------+----------------------------------------------------------+
+| ``hindi``      | org.apache.lucene.analysis.hi.HindiAnalyzer              |
++----------------+----------------------------------------------------------+
+| ``hungarian``  | org.apache.lucene.analysis.hu.HungarianAnalyzer          |
++----------------+----------------------------------------------------------+
+| ``indonesian`` | org.apache.lucene.analysis.id.IndonesianAnalyzer         |
++----------------+----------------------------------------------------------+
+| ``irish``      | org.apache.lucene.analysis.ga.IrishAnalyzer              |
++----------------+----------------------------------------------------------+
+| ``italian``    | org.apache.lucene.analysis.it.ItalianAnalyzer            |
++----------------+----------------------------------------------------------+
+| ``japanese``   | org.apache.lucene.analysis.ja.JapaneseAnalyzer           |
++----------------+----------------------------------------------------------+
+| ``japanese``   | import org.apache.lucene.analysis.ja.JapaneseTokenizer   |
 
 Review comment:
   Remove `import` here

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] [couchdb-documentation] kocolosk commented on a change in pull request #418: Add search index documentation

Reply via email to