This is an automated email from the ASF dual-hosted git repository. dweiss pushed a commit to branch jira/solr-13105-toMerge in repository https://gitbox.apache.org/repos/asf/solr.git
commit 27e726812977bd44e9bbd774dfb7c8a844f1c8ac Author: [email protected] <> AuthorDate: Thu Jan 7 17:58:31 2021 -0500 copy editing --- solr/solr-ref-guide/src/curve-fitting.adoc | 4 ++-- solr/solr-ref-guide/src/logs.adoc | 12 ++++++------ solr/solr-ref-guide/src/machine-learning.adoc | 6 +++--- solr/solr-ref-guide/src/numerical-analysis.adoc | 2 +- solr/solr-ref-guide/src/regression.adoc | 4 ++-- solr/solr-ref-guide/src/simulations.adoc | 4 ++-- solr/solr-ref-guide/src/statistics.adoc | 6 +++--- solr/solr-ref-guide/src/term-vectors.adoc | 6 +++--- solr/solr-ref-guide/src/time-series.adoc | 10 +++++----- solr/solr-ref-guide/src/variables.adoc | 4 ++-- 10 files changed, 29 insertions(+), 29 deletions(-) diff --git a/solr/solr-ref-guide/src/curve-fitting.adoc b/solr/solr-ref-guide/src/curve-fitting.adoc index c2edcec..31009a0 100644 --- a/solr/solr-ref-guide/src/curve-fitting.adoc +++ b/solr/solr-ref-guide/src/curve-fitting.adoc @@ -124,12 +124,12 @@ oscillations. This is particularly true if the sine wave has noise. After the cu extrapolated to any point in time in the past or future. -In the example below the orginal control points are shown in blue and the fitted curve is shown in yellow. +In the example below the original control points are shown in blue and the fitted curve is shown in yellow. image::images/math-expressions/harmfit.png[] -The output of `harmfit` is a model that can be used by the `predict` function to interpolate and exptrapolate +The output of `harmfit` is a model that can be used by the `predict` function to interpolate and extrapolate the sine wave. In the example below the `natural` function creates an *x* axis from 0 to 127 used to predict results for the model. This extrapolates the sine wave out to 128 points, when the original model curve had only 19 control points. diff --git a/solr/solr-ref-guide/src/logs.adoc b/solr/solr-ref-guide/src/logs.adoc index a049844..581a32e 100644 --- a/solr/solr-ref-guide/src/logs.adoc +++ b/solr/solr-ref-guide/src/logs.adoc @@ -55,7 +55,7 @@ When working with unfamiliar installations exploration can be used to understand covered in the logs, what shards and cores are in those collections and the types of operations being performed on those collections. -With familiar Solr installations exploration is still extremely +Even with familiar Solr installations exploration is still extremely important while trouble shooting because it will often turn up surprises such as unknown errors or unexpected admin or indexing operations. @@ -150,7 +150,7 @@ better understanding of what's contained in the logs. == Query Counting -Distributed searches produce more then one log record for each query. There will be one *top level* log +Distributed searches produce more than one log record for each query. There will be one *top level* log record for the top level distributed query and a *shard level* log record on one replica from each shard. There may also be a set of *ids* queries to retrieve fields by id from the shards to complete the page of results. @@ -186,7 +186,7 @@ image::images/math-expressions/query-ids.png[] == Query Performance -One of the important tasks of Solr log analytics is understanding how well a Solr Cluster +One of the important tasks of Solr log analytics is understanding how well a Solr cluster is performing. The *qtime_i* field contains the query time (QTime) in millis @@ -218,7 +218,7 @@ from the mean. === Highest QTime Scatter Plot -Its often useful to be able to visualize the highest query times recorded in the log data. +It's often useful to be able to visualize the highest query times recorded in the log data. This can be done by using the `search` function and sorting on *qtime_i desc*. In the example below the `search` function returns the highest 500 query times from the *ptest1* @@ -252,7 +252,7 @@ This converts the query times from milli-seconds to seconds. The result is set t *c*. The `round` function then rounds all elements of the query times vector to the nearest second. -The means all query times less then 500 millis will round to 0. +This means all query times less than 500 millis will round to 0. The `freqTable` function is then applied to the vector of query times rounded to the nearest second. @@ -362,7 +362,7 @@ image::images/math-expressions/query-spike.png[] The next step is to generate a time series that counts commits across the same time intervals. -The time series below uses the sames *start*, *end* and *gap* as the initial time series. But +The time series below uses the same *start*, *end* and *gap* as the initial time series. But this time series is computed for records that have a log type of *commit*. The count for the commits is calculated and plotted on *y-axis*. diff --git a/solr/solr-ref-guide/src/machine-learning.adoc b/solr/solr-ref-guide/src/machine-learning.adoc index 6bf8c63..58801f6 100644 --- a/solr/solr-ref-guide/src/machine-learning.adoc +++ b/solr/solr-ref-guide/src/machine-learning.adoc @@ -64,7 +64,7 @@ When this expression is sent to the `/stream` handler it responds with: } ---- -Below the distance is calculated using *Manahattan* distance. +Below the distance is calculated using *Manhattan* distance. [source,text] ---- @@ -314,7 +314,7 @@ r=knnRegress(obs, quality, 5, cosine()), 2) The `robust` named parameter can be used to perform a regression analysis that is robust to outliers in the outcomes. When the `robust` named parameter is used the median outcome -of the K nearest neighbors is used rather then then average. +of the K nearest neighbors is used rather than the average. Sample syntax: @@ -488,7 +488,7 @@ field. This analyzer returns bigrams which are then annotated to documents in a The `termVectors` function then creates TD-IDF term vectors from the bigrams stored in the *terms* field. The `kmeans` function is then used to cluster the bigram term vectors into 5 clusters. -Finally the top 5 features are extracted from the centroids an returned. Notice +Finally the top 5 features are extracted from the centroids and returned. Notice that the features are all bigram phrases with semantic significance. [source,text] diff --git a/solr/solr-ref-guide/src/numerical-analysis.adoc b/solr/solr-ref-guide/src/numerical-analysis.adoc index 41ee44d..16ec68e 100644 --- a/solr/solr-ref-guide/src/numerical-analysis.adoc +++ b/solr/solr-ref-guide/src/numerical-analysis.adoc @@ -279,7 +279,7 @@ the 9th floor was `415000` (row 3, column 3). The `bicubicSpline` function is then used to interpolate the grid, and the `predict` function is used to predict a value for year 2003, floor 8. -Notice that the matrix does not include a data point for year 2003, floor 8. The `bicupicSpline` +Notice that the matrix does not include a data point for year 2003, floor 8. The `bicubicSpline` function creates that data point based on the surrounding data in the matrix: [source,json] diff --git a/solr/solr-ref-guide/src/regression.adoc b/solr/solr-ref-guide/src/regression.adoc index 78b5a40..fb29e18 100644 --- a/solr/solr-ref-guide/src/regression.adoc +++ b/solr/solr-ref-guide/src/regression.adoc @@ -164,8 +164,8 @@ When this expression is sent to the `/stream` handler it responds with: === Regression Plot Using *zplot* and the Zeppelin-Solr interpreter we can visualize both the observations and the predictions in -the same scatter plot. In the example below zplot is plotting the filesize_d observations on the -*x* axis, the response_d observations on the *y* access and the predictions on the *y1* access. +the same scatter plot. In the example below zplot is plotting the *filesize_d* observations on the +*x* axis, the *response_d* observations on the *y* access and the predictions on the *y1* access. image::images/math-expressions/linear.png[] diff --git a/solr/solr-ref-guide/src/simulations.adoc b/solr/solr-ref-guide/src/simulations.adoc index 35b98d7..7d14f3d 100644 --- a/solr/solr-ref-guide/src/simulations.adoc +++ b/solr/solr-ref-guide/src/simulations.adoc @@ -48,7 +48,7 @@ Autocorrelation measures the degree to which a signal is correlated with itself. if a vector contains a signal or if there is dependency between values in a time series. If there is no signal and no dependency between values in the time series then the time series is random. -Its useful to plot the autocorrelation of the *change_d* vector to confirm that it is indeed random. +It's useful to plot the autocorrelation of the *change_d* vector to confirm that it is indeed random. In the example below the search results are set to a variable and then the *change_d* field is vectorized and stored in variable *b*. Then the @@ -158,7 +158,7 @@ image::images/math-expressions/randomwalk5.1.png[] The `monteCarlo` function can also be used to model a random walk of daily stock prices from the `normalDistribution` of daily stock returns. A random walk is a time series where each step is calculated by adding a random sample to the previous -step. This creates a time series where each value is dependant on the previous value, +step. This creates a time series where each value is dependent on the previous value, which simulates the autocorrelation of stock prices. In the example below the random walk is achieved by adding a random sample to the diff --git a/solr/solr-ref-guide/src/statistics.adoc b/solr/solr-ref-guide/src/statistics.adoc index 24787b5..c2c378a 100644 --- a/solr/solr-ref-guide/src/statistics.adoc +++ b/solr/solr-ref-guide/src/statistics.adoc @@ -162,7 +162,7 @@ image::images/math-expressions/cumProb.png[] Custom histograms can be defined and visualized by combining the output from multiple `stats` functions into a single histogram. Instead of automatically binning a numeric -field the custom histogram allows for comparision of bins based on queries. +field the custom histogram allows for comparison of bins based on queries. NOTE: The `stats` function is first discussed in the *Searching, Sampling and Aggregation* section of the user guide. @@ -194,7 +194,7 @@ the occurrence of each discrete data value and returns a list of tuples with the frequency statistics for each value. Below is an example of a frequency table built from a result set -of rounded *differences* in daily opening stock prises for the stock ticker *amzn*. +of rounded *differences* in daily opening stock prices for the stock ticker *amzn*. This example is interesting because it shows a multi-step process to arrive at the result. The first step is to *search* for records in the the *stocks* @@ -538,7 +538,7 @@ from the same population. drawn from the same population. * `mannWhitney`: The Mann-Whitney test is a non-parametric test that tests if two -samples of continuous were pulled +samples of continuous data were pulled from the same population. The Mann-Whitney test is often used instead of the T-test when the underlying assumptions of the T-test are not met. diff --git a/solr/solr-ref-guide/src/term-vectors.adoc b/solr/solr-ref-guide/src/term-vectors.adoc index 9736489..85de054 100644 --- a/solr/solr-ref-guide/src/term-vectors.adoc +++ b/solr/solr-ref-guide/src/term-vectors.adoc @@ -129,7 +129,7 @@ The phrase query "Man on Fire" is searched for and the top 5000 results, by scor returned. A single field from the results is return which is the "review_t" field that contains text of the movie review. -Then `cartesiaProduct` function is run over the search results. The `cartesianProduct` +Then `cartesianProduct` function is run over the search results. The `cartesianProduct` function applies the `analyze` function, which takes the *review_t* field and analyzes it with the Lucene/Solr analyzer attached to the *text_bigrams* schema field. This analyzer emits the bigrams found in the text field. The `cartesianProduct` function explodes each @@ -266,7 +266,7 @@ the noisy terms helps keep the term vector matrix small enough to fit comfortabl There are four parameters designed to filter noisy terms from the term vector matrix: -`minTermLength`:: +minTermLength:: The minimum term length required to include the term in the matrix. minDocFreq:: @@ -276,5 +276,5 @@ maxDocFreq:: The maximum percentage, expressed as a number between 0 and 1, of documents the term can appear in to be included in the index. exclude:: -A comma delimited list of strings used to exclude terms. If a term contains any of the exclude strings that +A comma delimited list of strings used to exclude terms. If a term contains any of the excluded strings that term will be excluded from the term vector. diff --git a/solr/solr-ref-guide/src/time-series.adoc b/solr/solr-ref-guide/src/time-series.adoc index 2c04899..5797490 100644 --- a/solr/solr-ref-guide/src/time-series.adoc +++ b/solr/solr-ref-guide/src/time-series.adoc @@ -112,7 +112,7 @@ functions are often used to confirm the direction of the trend. === Moving Average The `movingAvg` function computes a simple moving average over a sliding window of data. -The example below generates a time series, vectorizes the avg(close_d) field and computes the +The example below generates a time series, vectorizes the *avg(close_d)* field and computes the moving average with a window size of 5. The moving average function returns an array that is of shorter length @@ -131,7 +131,7 @@ image::images/math-expressions/movingavg.png[] The `expMovingAvg` function uses a different formula for computing the moving average that responds faster to changes in the underlying data. This means that it is -less of a lagging indicator then the simple moving average. +less of a lagging indicator than the simple moving average. Below is an example that computes a moving average and exponential moving average and plots them along with the original y values. Notice how the exponential moving average is more sensitive @@ -143,7 +143,7 @@ image::images/math-expressions/expmoving.png[] === Moving Median The `movingMedian` function uses the median of the sliding window rather than the average. -In many cases the moving median will be more *robust* to outliers then moving averages. +In many cases the moving median will be more *robust* to outliers than moving averages. Below is an example computing the moving median: @@ -218,7 +218,7 @@ The `movingMAD` (moving mean absolute deviation) function can be used to surface in a time series by measuring dispersion (deviation from the mean) within a sliding window. The `movingMAD` function operates in a similar manner as a moving average, except it -measures the mean absolute deviation within the window rather then the average. By +measures the mean absolute deviation within the window rather than the average. By looking for unusually high or low dispersion we can find anomalies in the time series. @@ -253,7 +253,7 @@ The `outliers` function takes four parameters: * Probability distribution * Numeric vector * Low probability threshold -* High probablity threshold +* High probability threshold * List of results that the numeric vector was selected from. The `outliers` function iterates the numeric vector and uses the probability diff --git a/solr/solr-ref-guide/src/variables.adoc b/solr/solr-ref-guide/src/variables.adoc index ee7109c..ba6c574 100644 --- a/solr/solr-ref-guide/src/variables.adoc +++ b/solr/solr-ref-guide/src/variables.adoc @@ -208,8 +208,8 @@ be cached in-memory for future use. The `putCache` function adds a variable to the cache. -In the example below an array is cached in the `workspace` "workspace1" -and bound to the `key` "key1". The workspace allows different users to cache +In the example below an array is cached in the workspace *workspace1* +and bound to the key *key1*. The workspace allows different users to cache objects in their own workspace. The `putCache` function returns the variable that was added to the cache.
