This is an automated email from the ASF dual-hosted git repository. dweiss pushed a commit to branch jira/solr-13105-toMerge in repository https://gitbox.apache.org/repos/asf/solr.git
commit 3a672dc06f5c69f210b0deef1b8aa6ccba27a89d Author: Cassandra Targett <[email protected]> AuthorDate: Fri Jan 8 16:03:47 2021 -0600 More function name, variable, typo cleanup --- solr/solr-ref-guide/src/matrix-math.adoc | 35 ++++------ solr/solr-ref-guide/src/scalar-math.adoc | 6 +- .../src/taking-solr-to-production.adoc | 2 +- solr/solr-ref-guide/src/term-vectors.adoc | 79 ++++++++++------------ solr/solr-ref-guide/src/the-terms-component.adoc | 4 +- .../src/updating-parts-of-documents.adoc | 4 +- .../src/uploading-data-with-index-handlers.adoc | 2 +- solr/solr-ref-guide/src/variables.adoc | 27 ++++---- solr/solr-ref-guide/src/vector-math.adoc | 22 +++--- 9 files changed, 83 insertions(+), 98 deletions(-) diff --git a/solr/solr-ref-guide/src/matrix-math.adoc b/solr/solr-ref-guide/src/matrix-math.adoc index f354228..fd9ec3f 100644 --- a/solr/solr-ref-guide/src/matrix-math.adoc +++ b/solr/solr-ref-guide/src/matrix-math.adoc @@ -16,8 +16,7 @@ // specific language governing permissions and limitations // under the License. -Matrices are used -as both inputs and outputs of many mathematical functions. +Matrices are used as both inputs and outputs of many mathematical functions. This section of the user guide covers the basics of matrix creation, manipulation and matrix math. @@ -60,22 +59,21 @@ responds with: "RESPONSE_TIME": 0 } ] - } + }} ---- == Row and Column Labels A matrix can have column and rows and labels. The functions -`setRowLabels`, `setColumnLabels`, `getRowLabels` and `getColumnLabels` -can be used to set and get the labels. The label values -are set using string arrays. +`setRowLabels`, `setColumnLabels`, `getRowLabels`, and `getColumnLabels` +can be used to set and get the labels. +The label values are set using string arrays. The example below sets the row and column labels. In other sections of the user guide examples are shown where functions return matrices with the labels already set. -Below is a simple example of setting and -getting row and column labels +Below is a simple example of setting and getting row and column labels on a matrix. [source,text] @@ -118,9 +116,8 @@ responds with: == Visualization -The `zplot` function can plot matrices as a heat map using the *heat* named parameter. -Heat maps are powerful visualization tools for displaying *correlation* and *distance* -matrices described later in the guide. +The `zplot` function can plot matrices as a heat map using the `heat` named parameter. +Heat maps are powerful visualization tools for displaying <<statistics.adoc#correlation-matrices,*correlation*>> and <<machine-learning.adoc#distance-and-distance-matrices,*distance*>> matrices described later in the guide. The example below shows a 2x2 matrix visualized using the heat map visualization in Apache Zeppelin. @@ -331,9 +328,9 @@ responds with: == Scalar Matrix Math The same scalar math functions that apply to vectors can also be applied to matrices: `scalarAdd`, `scalarSubtract`, -`scalarMultiply`, `scalarDivide`. Below is an example of the `scalarAdd` function -which adds a scalar value to each element in a matrix. +`scalarMultiply`, `scalarDivide`. +Below is an example of the `scalarAdd` function which adds a scalar value to each element in a matrix. [source,text] ---- @@ -342,8 +339,7 @@ let(a=matrix(array(1, 2), b=scalarAdd(10, a)) ---- -When this expression is sent to the `/stream` handler it -responds with: +When this expression is sent to the `/stream` handler it responds with: [source,json] ---- @@ -377,7 +373,7 @@ Two matrices can be added and subtracted using the `ebeAdd` and `ebeSubtract` fu which perform element-by-element addition and subtraction of matrices. -Below is a simple example of an element-by-element addition of a matrix by itself: +Below is a simple example of an element-by-element addition using `ebeAdd` of a matrix by itself: [source,text] ---- @@ -386,8 +382,7 @@ let(a=matrix(array(1, 2), b=ebeAdd(a, a)) ---- -When this expression is sent to the `/stream` handler it -responds with: +When this expression is sent to the `/stream` handler it responds with: [source,json] ---- @@ -417,8 +412,8 @@ responds with: == Matrix Multiplication -Matrix multiplication can be accomplished using the `matrixMult` function. Below is a simple -example of matrix multiplication: +Matrix multiplication can be accomplished using the `matrixMult` function. +Below is a simple example of matrix multiplication: [source,text] ---- diff --git a/solr/solr-ref-guide/src/scalar-math.adoc b/solr/solr-ref-guide/src/scalar-math.adoc index 089936f..66f38ca 100644 --- a/solr/solr-ref-guide/src/scalar-math.adoc +++ b/solr/solr-ref-guide/src/scalar-math.adoc @@ -94,13 +94,13 @@ The `select` function can also use math expressions to compute new values and add them to the outgoing tuples. In the example below the `select` expression is wrapping a search -expression. The `select` function is selecting the *response_d* field -and computing a new field called *new_response* using the `mult` math +expression. The `select` function is selecting the `response_d` field +and computing a new field called `new_response` using the `mult` math expression. The first parameter of the `mult` expression is the *response_d* field. The second parameter is the scalar value 10. This multiplies the value -of the *response_d* field in each tuple by 10. +of the `response_d` field in each tuple by 10. [source,text] ---- diff --git a/solr/solr-ref-guide/src/taking-solr-to-production.adoc b/solr/solr-ref-guide/src/taking-solr-to-production.adoc index 7e79b12..4bf92e7 100644 --- a/solr/solr-ref-guide/src/taking-solr-to-production.adoc +++ b/solr/solr-ref-guide/src/taking-solr-to-production.adoc @@ -240,7 +240,7 @@ Setting the hostname of the Solr server is recommended, especially when running === Environment Banner in Admin UI -To guard against accidentally doing changes to the wrong cluster, you may configure a visual indication in the Admin UI of whether you currently work with a production environment or not. To do this, edit your `solr.in.sh` or `solr.in.cmd` file with a `-Dsolr.environment=prod` setting, or set the cluster property named `environment`. To specify label and/or color, use a comma delimited format as below. The `+` character can be used instead of space to avoid quoting. Colors may be valid C [...] +To guard against accidentally doing changes to the wrong cluster, you may configure a visual indication in the Admin UI of whether you currently work with a production environment or not. To do this, edit your `solr.in.sh` or `solr.in.cmd` file with a `-Dsolr.environment=prod` setting, or set the cluster property named `environment`. To specify label and/or color, use a comma-delimited format as below. The `+` character can be used instead of space to avoid quoting. Colors may be valid C [...] * `prod` * `test,label=Functional+test` diff --git a/solr/solr-ref-guide/src/term-vectors.adoc b/solr/solr-ref-guide/src/term-vectors.adoc index 85de054..8404726 100644 --- a/solr/solr-ref-guide/src/term-vectors.adoc +++ b/solr/solr-ref-guide/src/term-vectors.adoc @@ -59,12 +59,11 @@ When this expression is sent to the `/stream` handler it responds with: === Annotating Documents -The `analyze` function can be used inside of a `select` function to annotate documents with the tokens -generated by the analysis. +The `analyze` function can be used inside of a `select` function to annotate documents with the tokens generated by the analysis. -The example below performs a `search` in "collection1". Each tuple returned by the `search` function -contains an `id` and `subject`. For each tuple, the -`select` function selects the `id` field and calls the `analyze` function on the `subject` field. +The example below performs a `search` in "collection1". +Each tuple returned by the `search` function contains an `id` and `subject`. +For each tuple, the `select` function selects the `id` field and calls the `analyze` function on the `subject` field. The analyzer chain specified by the `subject_bigram` field is configured to perform a bigram analysis. The tokens generated by the `analyze` function are added to each tuple in a field called `terms`. @@ -112,42 +111,36 @@ The `cartesianProduct` function can be used in conjunction with the `analyze` function to perform a wide range of text analytics. -The `cartesianProduct` function explodes a multivalued -field into a stream of tuples. When the `analyze` function is used -to create the multivalued field, the `cartesianProduct` function will -explode the analyzed tokens into a stream of tuples. This allows -analytics to be performed over the stream of analyzed tokens and the result -to be visualized with Zeppelin-Solr. +The `cartesianProduct` function explodes a multivalued field into a stream of tuples. +When the `analyze` function is used to create the multivalued field, the `cartesianProduct` function will explode the analyzed tokens into a stream of tuples. +This allows analytics to be performed over the stream of analyzed tokens and the result to be visualized with Zeppelin-Solr. *Example: Phrase Aggregation* -An example performing phrase aggregation is used to illustrate the power of combining -`cartesianProduct` and `analyze`. +An example performing phrase aggregation is used to illustrate the power of combining `cartesianProduct` and `analyze`. In this example the `search` expression is performed over a collection of movie reviews. -The phrase query "Man on Fire" is searched for and the top 5000 results, by score are -returned. A single field from the results is return which is the "review_t" field that +The phrase query "Man on Fire" is searched for and the top 5000 results, by score are returned. +A single field from the results is return which is the `review_t` field that contains text of the movie review. -Then `cartesianProduct` function is run over the search results. The `cartesianProduct` -function applies the `analyze` function, which takes the *review_t* field and analyzes -it with the Lucene/Solr analyzer attached to the *text_bigrams* schema field. This analyzer -emits the bigrams found in the text field. The `cartesianProduct` function explodes each -bigram into its own tuple with the bigram stored in the field *term*. +Then `cartesianProduct` function is run over the search results. +The `cartesianProduct` function applies the `analyze` function, which takes the `review_t` field and analyzes it with the Lucene/Solr analyzer attached to the `text_bigrams` schema field. +This analyzer emits the bigrams found in the text field. +The `cartesianProduct` function explodes each bigram into its own tuple with the bigram stored in the field `term`. The stream of tuples, each containing a bigram, is then filtered by the `having` function using regular expressions to select bigrams with a length of 12 or greater and to filter out bigrams that contain specific characters. -The `hashRollup` function then aggregates the bigrams and the `top` function emits the top -10 bigrams by count. +The `hashRollup` function then aggregates the bigrams and the `top` function emits the top 10 bigrams by count. Then Zeppelin-Solr is used to visualize the top 10 ten bigrams. image::images/math-expressions/text-analytics.png[] Lucene/Solr analyzers can be configured in many different ways to support -aggregations over NLP entities (people, places, companies etc...) as well as +aggregations over NLP entities (people, places, companies, etc.) as well as tokens extracted with regular expressions or dictionaries. == TF-IDF Term Vectors @@ -157,8 +150,9 @@ The `termVectors` function can be used to build TF-IDF term vectors from the ter The `termVectors` function operates over a list of tuples that contain a field called `id` and a field called `terms`. Notice that this is the exact output structure of the document annotation example above. -The `termVectors` function builds a matrix from the list of tuples. There is row in the -matrix for each tuple in the list. There is a column in the matrix for each term in the `terms` field. +The `termVectors` function builds a matrix from the list of tuples. +There is row in the matrix for each tuple in the list. +There is a column in the matrix for each term in the `terms` field. [source,text] ---- @@ -173,17 +167,16 @@ let(echo="c, d", <1> The example below builds on the document annotation example. -<1> The `echo` parameter will echo variables *`c`* and *`d`*, so the output includes +<1> The `echo` parameter will echo variables `c` and `d`, so the output includes the row and column labels, which will be defined later in the expression. -<2> The list of tuples are stored in variable *`a`*. The `termVectors` function -operates over variable *`a`* and builds a matrix with 2 rows and 4 columns. -<3> The `termVectors` function sets the row and column labels of the term vectors matrix as variable *`b`*. +<2> The list of tuples are stored in variable `a`. The `termVectors` function +operates over variable `a` and builds a matrix with 2 rows and 4 columns. +<3> The `termVectors` function sets the row and column labels of the term vectors matrix as variable `b`. The row labels are the document ids and the column labels are the terms. <4> The `getRowLabels` and `getColumnLabels` functions return the row and column labels which are then stored in variables *`c`* and *`d`*. -When this expression is sent to the `/stream` handler it -responds with: +When this expression is sent to the `/stream` handler it responds with: [source,json] ---- @@ -213,8 +206,8 @@ responds with: === TF-IDF Values -The values within the term vectors matrix are the TF-IDF values for each term in each document. The -example below shows the values of the matrix. +The values within the term vectors matrix are the TF-IDF values for each term in each document. +The example below shows the values of the matrix. [source,text] ---- @@ -224,8 +217,7 @@ let(a=select(search(collection3, q="*:*", fl="id, subject", sort="id asc"), b=termVectors(a, minTermLength=4, minDocFreq=0, maxDocFreq=1)) ---- -When this expression is sent to the `/stream` handler it -responds with: +When this expression is sent to the `/stream` handler it responds with: [source,json] ---- @@ -259,22 +251,21 @@ responds with: === Limiting the Noise -One of the key challenges when with working term vectors is that text often has a significant amount of noise -which can obscure the important terms in the data. The `termVectors` function has several parameters -designed to filter out the less meaningful terms. This is also important because eliminating -the noisy terms helps keep the term vector matrix small enough to fit comfortably in memory. +One of the key challenges when working with term vectors is that text often has a significant amount of noise which can obscure the important terms in the data. +The `termVectors` function has several parameters designed to filter out the less meaningful terms. +This is also important because eliminating the noisy terms helps keep the term vector matrix small enough to fit comfortably in memory. There are four parameters designed to filter noisy terms from the term vector matrix: -minTermLength:: +`minTermLength`:: The minimum term length required to include the term in the matrix. -minDocFreq:: +`minDocFreq`:: The minimum percentage, expressed as a number between 0 and 1, of documents the term must appear in to be included in the index. -maxDocFreq:: +`maxDocFreq`:: The maximum percentage, expressed as a number between 0 and 1, of documents the term can appear in to be included in the index. -exclude:: -A comma delimited list of strings used to exclude terms. If a term contains any of the excluded strings that +`exclude`:: +A comma-delimited list of strings used to exclude terms. If a term contains any of the excluded strings that term will be excluded from the term vector. diff --git a/solr/solr-ref-guide/src/the-terms-component.adoc b/solr/solr-ref-guide/src/the-terms-component.adoc index 87d7993..cf27d22 100644 --- a/solr/solr-ref-guide/src/the-terms-component.adoc +++ b/solr/solr-ref-guide/src/the-terms-component.adoc @@ -53,7 +53,7 @@ Specifies the field from which to retrieve terms. This parameter is required if Example: `terms.fl=title` `terms.list`:: -Fetches the document frequency for a comma delimited list of terms. Terms are always returned in index order. If `terms.ttf` is set to true, also returns their total term frequency. If multiple `terms.fl` are defined, these statistics will be returned for each term in each requested field. +Fetches the document frequency for a comma-delimited list of terms. Terms are always returned in index order. If `terms.ttf` is set to true, also returns their total term frequency. If multiple `terms.fl` are defined, these statistics will be returned for each term in each requested field. + Example: `terms.list=termA,termB,termC` + @@ -353,7 +353,7 @@ The `shards` parameter is subject to a host whitelist that has to be configured + By default the whitelist will be populated with all live nodes when running in SolrCloud mode. If you need to disable this feature for backwards compatibility, you can set the system property `solr.disable.shardsWhitelist=true`. + -See the section <<distributed-requests.adoc#configuring-the-shardhandlerfactory,Configuring the ShardHandlerFactory>> for more information about how the whitelist works. +See the section <<distributed-requests.adoc#configuring-the-shardhandlerfactory,Configuring the ShardHandlerFactory>> for more information about how the whitelist works. `shards.qt`:: Specifies the request handler Solr uses for requests to shards. diff --git a/solr/solr-ref-guide/src/updating-parts-of-documents.adoc b/solr/solr-ref-guide/src/updating-parts-of-documents.adoc index f159d42..345d706 100644 --- a/solr/solr-ref-guide/src/updating-parts-of-documents.adoc +++ b/solr/solr-ref-guide/src/updating-parts-of-documents.adoc @@ -427,7 +427,7 @@ The basic usage of `DocBasedVersionConstraintsProcessorFactory` is to configure </processor> ---- -Note that `versionField` is a comma delimited list of fields to check for version numbers. +Note that `versionField` is a comma-delimited list of fields to check for version numbers. Once configured, this update processor will reject (HTTP error code 409) any attempt to update an existing document where the value of the `my_version_l` field in the "new" document is not greater then the value of that field in the existing document. .versionField vs `\_version_` @@ -448,7 +448,7 @@ The value of this option should be the name of a request parameter that the proc + When using this request parameter, any Delete By Id command with a high enough document version number to succeed will be internally converted into an Add Document command that replaces the existing document with a new one which is empty except for the Unique Key and `versionField` to keeping a record of the deleted version so future Add Document commands will fail if their "new" version is not high enough. + -If `versionField` is specified as a list, then this parameter too must be specified as a comma delimited list of the same size so that the parameters correspond with the fields. +If `versionField` is specified as a list, then this parameter too must be specified as a comma-delimited list of the same size so that the parameters correspond with the fields. `supportMissingVersionOnOldDocs`:: This boolean parameter defaults to `false`, but if set to `true` allows any documents written *before* this feature is enabled, and which are missing the `versionField`, to be overwritten. diff --git a/solr/solr-ref-guide/src/uploading-data-with-index-handlers.adoc b/solr/solr-ref-guide/src/uploading-data-with-index-handlers.adoc index b868519..9202400 100644 --- a/solr/solr-ref-guide/src/uploading-data-with-index-handlers.adoc +++ b/solr/solr-ref-guide/src/uploading-data-with-index-handlers.adoc @@ -521,7 +521,7 @@ Example: `rowidOffset=10` The same feature used to index CSV documents can also be easily used to index tab-delimited files (TSV files) and even handle backslash escaping rather than CSV encapsulation. -For example, one can dump a MySQL table to a tab delimited file with: +For example, one can dump a MySQL table to a tab-delimited file with: [source,sql] ---- diff --git a/solr/solr-ref-guide/src/variables.adoc b/solr/solr-ref-guide/src/variables.adoc index ba6c574..6cc2bd9 100644 --- a/solr/solr-ref-guide/src/variables.adoc +++ b/solr/solr-ref-guide/src/variables.adoc @@ -25,8 +25,8 @@ variables with math expressions. The `let` expression sets variables and returns the value of the last variable by default. The output of any streaming expression or math expression can be set to a variable. -Below is a simple example setting three variables *`a`*, *`b`* -and *`c`*. Variables *`a`* and *`b`* are set to arrays. The variable *`c`* is set +Below is a simple example setting three variables `a`, `b`, +and `c`. Variables `a` and `b` are set to arrays. The variable `c` is set to the output of the `ebeAdd` function which performs element-by-element addition of the two arrays. @@ -37,7 +37,7 @@ let(a=array(1, 2, 3), c=ebeAdd(a, b)) ---- -In the response, notice that the last variable, *`c`*, is returned: +In the response, notice that the last variable, `c`, is returned: [source,json] ---- @@ -106,7 +106,7 @@ responds with: } ---- -A specific set of variables can be echoed by providing a comma delimited list of variables to the echo parameter. +A specific set of variables can be echoed by providing a comma-delimited list of variables to the `echo` parameter. Because variables have been provided, the `true` value is assumed. [source,text] @@ -147,11 +147,11 @@ When this expression is sent to the `/stream` handler it responds with: == Visualizing Variables -The *let* expression can also include a *zplot* expression that can be used to visualize the +The `let` expression can also include a `zplot` expression that can be used to visualize the variables. -In the example below the variables *a* and *b* are set to arrays. The zplot function -outputs the variables as *x* and *y* fields in the output. +In the example below the variables `a` and `b` are set to arrays. The `zplot` function +outputs the variables as `x` and `y` fields in the output. [source,text] ---- @@ -208,10 +208,10 @@ be cached in-memory for future use. The `putCache` function adds a variable to the cache. -In the example below an array is cached in the workspace *workspace1* -and bound to the key *key1*. The workspace allows different users to cache -objects in their own workspace. The `putCache` function returns -the variable that was added to the cache. +In the example below an array is cached in the workspace `workspace1` +and bound to the key `key1`. +The workspace allows different users to cache objects in their own workspace. +The `putCache` function returns the variable that was added to the cache. [source,text] ---- @@ -246,7 +246,7 @@ When this expression is sent to the `/stream` handler it responds with: The `getCache` function retrieves an object from the cache by its workspace and key. -In the example below the `getCache` function retrieves the array that was cached above and assigns it to variable *`a`*. +In the example below the `getCache` function retrieves the array that was cached above and assigns it to variable `a`. [source,text] ---- @@ -285,8 +285,7 @@ In the example below `listCache` returns all the workspaces in the cache as an a let(a=listCache()) ---- -When this expression is sent to the `/stream` handler it -responds with: +When this expression is sent to the `/stream` handler it responds with: [source,json] ---- diff --git a/solr/solr-ref-guide/src/vector-math.adoc b/solr/solr-ref-guide/src/vector-math.adoc index 235d420..2c3ee66 100644 --- a/solr/solr-ref-guide/src/vector-math.adoc +++ b/solr/solr-ref-guide/src/vector-math.adoc @@ -54,16 +54,16 @@ When this expression is sent to the `/stream` handler it responds with a JSON ar == Visualization -The *zplot* function can be used to visualize vectors using Zeppelin-Solr. +The `zplot` function can be used to visualize vectors using Zeppelin-Solr. Let's first see what happens when we visualize the array function as a table. image::images/math-expressions/array.png[] -It appears as one row with a comma delimited list of values. You'll find that you can't visualize this output +It appears as one row with a comma-delimited list of values. You'll find that you can't visualize this output using any of the plotting tools. -To plot the array you need the *zplot* function. Let's first look at how zplot output looks like in json format. +To plot the array you need the `zplot` function. Let's first look at how `zplot` output looks like in JSON format. [source,text] ---- @@ -95,7 +95,7 @@ When this expression is sent to the `/stream` handler it responds with a JSON ar } ---- -zplot has turned the array into three tuples with the field *x*. +`zplot` has turned the array into three tuples with the field `x`. Let's add another array: @@ -132,13 +132,13 @@ When this expression is sent to the `/stream` handler it responds with a JSON ar } ---- -Now we have three tuples with *x* and *y* fields. +Now we have three tuples with `x` and `y` fields. Let's see how Zeppelin-Solr handles this output in table format: image::images/math-expressions/xy.png[] -Now that we have *x* and *y* columns defined we can simply switch to one of the line charts +Now that we have `x` and `y` columns defined we can simply switch to one of the line charts and plugin the fields to plot using the chart settings: image::images/math-expressions/line1.png[] @@ -279,7 +279,7 @@ When this expression is sent to the `/stream` handler it responds with: == Getting Values By Index -Values from a vector can be retrieved by index with the *valueAt* function. +Values from a vector can be retrieved by index with the `valueAt` function. [source,text] ---- @@ -307,7 +307,7 @@ When this expression is sent to the `/stream` handler it responds with: == Sequences -The *sequence* function can be used to generate a sequence of numbers as an array. +The `sequence` function can be used to generate a sequence of numbers as an array. The example below returns a sequence of 10 numbers, starting from 0, with a stride of 2. [source,text] @@ -345,7 +345,7 @@ When this expression is sent to the `/stream` handler it responds with: } ---- -The *natural* function can be used to create a sequence of *natural* numbers starting from zero. +The `natural` function can be used to create a sequence of *natural* numbers starting from zero. Natural numbers are positive integers. The example below creates a sequence starting at zero with all natural numbers up to, but not including @@ -486,8 +486,8 @@ When this expression is sent to the `/stream` handler it responds with: == Scalar Vector Math -Scalar vector math functions add, subtract, multiply or divide a scalar value with every value in a vector. -The following functions perform these operations: `scalarAdd`, `scalarSubtract`, `scalarMultiply` +Scalar vector math functions add, subtract, multiply, or divide a scalar value with every value in a vector. +The following functions perform these operations: `scalarAdd`, `scalarSubtract`, `scalarMultiply`, and `scalarDivide`. Below is an example of the `scalarMultiply` function, which multiplies the scalar value `3` with
