[
https://issues.apache.org/jira/browse/SOLR-8393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15204298#comment-15204298
]
Steve Molloy commented on SOLR-8393:
------------------------------------
h1. Sizing Component
The Solr SizeComponent is intended to compute resource usage information for a
given Solr core. It will perform those computations based on current index
schema, Solr configuration and document indexed in the core. It is not meant to
be distributable, see the cluster sizing action of the collection admin API for
more information about sizing distributed collections.
h2. Configuration
The SizeComponent, like any search component except for the base ones, must be
defined in the solrconfig.xml file before it can be used. This is done in 2
parts.
1- Declare the component:
<searchComponent name="size" class="solr.SizeComponent" />
2- Use the component in some handler, using the default /select handler will
make it easier to use:
<requestHandler name="/select" class="solr.SearchHandler">
...
<arr name="last-components">
...
<str>size</str>
</arr>
</requestHandler>
h2. Usage
Once you have configured the SizeComponent, it can be requested by enabling it
in a standard query:
http://localhost:8983/solr/core/select?q=*:*&rows=0&wt=xml&size=true
h3. Parameters
||name||type||default||description||
|size|boolean|false|If set to true, sizing information will be included in
response.|
|avgDocSize|long|0| Document size used to compute resource usage. If less
than 1, the value will be computed using the content of currently indexed
documents.|
|numDocs|long|0|Number of documents to use when computing resource usage. If
less than 1, actual number of indexed documents will be used. This parameter
will be ignored if estimationRatio is specified.|
|estimationRatio|double|0.0|Ratio used for resource usage estimations. If a
value greater than 0.0 is specified, the current number of documents will be
multiplied by this ratio in order to determine number of documents to be used
when computing resource usage.|
|deletedDocs|long|-| If specified, will be used as number of deleted
documents in the index when computing resource usage, otherwise, current number
of deleted documents will be used instead.|
|filterCacheMax|long|-|Size of the filter cache to use for computing resource
usage, if not specified, current filter cache size will be used.|
|queryResultCacheMax|long|-|Size of the query result cache to use for computing
resource usage, if not specified, current query result cache size will be used.|
|documentCacheMax|long|-|Size of the document cache to use for computing
resource usage, if not specified, current document cache size will be used.|
|queryResultMaxDocsCached|long|-|Maximum number of documents to cache per entry
in query result cache to use for computing resource usage, if not specified,
current maximum will be used.|
h3. Response
{code:xml}
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">109</int>
<lst name="params">
<str name="q">*:*</str>
<str name="size">true</str>
<str name="indent">true</str>
<str name="rows">0</str>
<str name="wt">xml</str>
</lst>
</lst>
<result name="response" numFound="2287" start="0">
</result>
<lst name="size">
<str name="total-disk-size">199.6 MB</str>
<str name="total-lucene-RAM">33.35 MB</str>
<str name="total-solr-RAM">79.16 MB</str>
<long name="estimated-num-docs">2287</long>
<str name="estimated-doc-size">89.37 KB</str>
<lst name="solr-details">
<str name="filterCache">152.94 KB</str>
<str name="queryResultCache">1,000 KB</str>
<str name="documentCache">44.68 MB</str>
<str name="luceneRam">33.35 MB</str>
</lst>
</lst>
</response>
{code}
||result field|| ||description||
|total-disk-size| |Estimation of total disk space used by the index according
to parameters.|
|total-lucene-RAM| |Estimation of index RAM usage specifically for Lucene
according to parameters.|
|total-solr-RAM| |Estimation of total index RAM usage for Solr (including
Lucene) according to parameters.|
|estimated-num-docs| |Number of documents used for computing estimated values.|
|estimated-doc-size| |Average size of document used for computing estimated
values.|
|solr-details|filterCache|Estimated maximum amount of RAM used for caching
filters for the index, if cache was filled.|
| |queryResultCache|Estimated maximum amount of RAM used for caching query
results for the index, if cache was filled.|
| |documentCache|Estimated maximum amount of RAM used for caching documents for
the index, if cache was filled.|
| |luceneRam|Estimated amount of RAM used by Lucene for the index.|
h1. Cluster Sizing
The cluster sizing action of the collection handler is intended to estimate
resource usage for a complete Solr cluster. It is based on the Size Component
and will perform calls to it internally in order to merge the results and
compute aggregated estimations. It does not require any specific configuration,
but requires that the SizeComponent is declared and used by the /select handler
so that the ClusterSizing action can perform requests to it.
h2. Usage
The cluster sizing action can be accessed through the collections handler:
http://localhost:8983/solr/admin/collections?action=clustersizing
h3. Parameters
All parameters from the SizeComponent, except for size parameter itself, can be
passed to the cluster sizing action and will be relayed to the SizeComponent
when estimating resource usage. Below is the list of parameters specific to
this action, for SizeComponent parameters, see the parameter table for it.
||name||type||default||description||
|collection|string|-|List of collections (CSV) to be included in the report, if
not specified, all collections will be included.|
|shard|string|-|List of shards (CSV) to be included in the report, if not
specified, all shards will be included.|
|replica|string|-|List of replicas (CSV) to be included in the report, if not
specified, all replicas will be included.|
h3. Response
The response fields are the same as for SizeComponent, but will be grouped in 2
ways. First, each node will have estimated total usage, then all collections
will have details grouped by shards and then for each replica.
> Component for Solr resource usage planning
> ------------------------------------------
>
> Key: SOLR-8393
> URL: https://issues.apache.org/jira/browse/SOLR-8393
> Project: Solr
> Issue Type: Improvement
> Reporter: Steve Molloy
> Attachments: SOLR-8393.patch, SOLR-8393.patch, SOLR-8393.patch,
> SOLR-8393.patch, SOLR-8393.patch, SOLR-8393.patch
>
>
> One question that keeps coming back is how much disk and RAM do I need to run
> Solr. The most common response is that it highly depends on your data. While
> true, it makes for frustrated users trying to plan their deployments.
> The idea I'm bringing is to create a new component that will attempt to
> extrapolate resources needed in the future by looking at resources currently
> used. By adding a parameter for the target number of documents, current
> resources are adapted by a ratio relative to current number of documents.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]