[ 
https://issues.apache.org/jira/browse/SOLR-8393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15204298#comment-15204298
 ] 

Steve Molloy commented on SOLR-8393:
------------------------------------

h1. Sizing Component

The Solr SizeComponent is intended to compute resource usage information for a 
given Solr core. It will perform those computations based on current index 
schema, Solr configuration and document indexed in the core. It is not meant to 
be distributable, see the cluster sizing action of the collection admin API for 
more information about sizing distributed collections.

h2. Configuration

The SizeComponent, like any search component except for the base ones, must be 
defined in the solrconfig.xml file before it can be used. This is done in 2 
parts.

1- Declare the component:

    <searchComponent name="size" class="solr.SizeComponent" />  

2- Use the component in some handler, using the default /select handler will 
make it easier to use:

    <requestHandler name="/select" class="solr.SearchHandler">  
    ... 
        <arr name="last-components">  
      ...
            <str>size</str>  
        </arr>  
    </requestHandler>  

h2. Usage

Once you have configured the SizeComponent, it can be requested by enabling it 
in a standard query:

http://localhost:8983/solr/core/select?q=*:*&rows=0&wt=xml&size=true

h3. Parameters
||name||type||default||description||
|size|boolean|false|If set to true, sizing information will be included in 
response.|
|avgDocSize|long|0|     Document size used to compute resource usage. If less 
than 1, the value will be computed using the content of currently indexed 
documents.|
|numDocs|long|0|Number of documents to use when computing resource usage. If 
less than 1, actual number of indexed documents will be used. This parameter 
will be ignored if estimationRatio is specified.|
|estimationRatio|double|0.0|Ratio used for resource usage estimations. If a 
value greater than 0.0 is specified, the current number of documents will be 
multiplied by this ratio in order to determine number of documents to be used 
when computing resource usage.|
|deletedDocs|long|-|    If specified, will be used as number of deleted 
documents in the index when computing resource usage, otherwise, current number 
of deleted documents will be used instead.|
|filterCacheMax|long|-|Size of the filter cache to use for computing resource 
usage, if not specified, current filter cache size will be used.|
|queryResultCacheMax|long|-|Size of the query result cache to use for computing 
resource usage, if not specified, current query result cache size will be used.|
|documentCacheMax|long|-|Size of the document cache to use for computing 
resource usage, if not specified, current document cache size will be used.|
|queryResultMaxDocsCached|long|-|Maximum number of documents to cache per entry 
in query result cache to use for computing resource usage, if not specified, 
current maximum will be used.|

h3. Response

{code:xml}
    <?xml version="1.0" encoding="UTF-8"?>  
    <response>  
    <lst name="responseHeader">  
      <int name="status">0</int>  
      <int name="QTime">109</int>  
      <lst name="params">  
        <str name="q">*:*</str>  
        <str name="size">true</str>  
        <str name="indent">true</str>  
        <str name="rows">0</str>  
        <str name="wt">xml</str>  
      </lst>  
    </lst>  
    <result name="response" numFound="2287" start="0">  
    </result>  
    <lst name="size">  
      <str name="total-disk-size">199.6 MB</str>  
      <str name="total-lucene-RAM">33.35 MB</str>  
      <str name="total-solr-RAM">79.16 MB</str>  
      <long name="estimated-num-docs">2287</long>  
      <str name="estimated-doc-size">89.37 KB</str>  
      <lst name="solr-details">  
        <str name="filterCache">152.94 KB</str>  
        <str name="queryResultCache">1,000 KB</str>  
        <str name="documentCache">44.68 MB</str>  
        <str name="luceneRam">33.35 MB</str>  
      </lst>  
    </lst>  
    </response>  
{code}

||result field|| ||description||
|total-disk-size| |Estimation of total disk space used by the index according 
to parameters.|
|total-lucene-RAM| |Estimation of index RAM usage specifically for Lucene 
according to parameters.|
|total-solr-RAM| |Estimation of total index RAM usage for Solr (including 
Lucene) according to parameters.|
|estimated-num-docs| |Number of documents used for computing estimated values.|
|estimated-doc-size| |Average size of document used for computing estimated 
values.|
|solr-details|filterCache|Estimated maximum amount of RAM used for caching 
filters for the index, if cache was filled.|
| |queryResultCache|Estimated maximum amount of RAM used for caching query 
results for the index, if cache was filled.|
| |documentCache|Estimated maximum amount of RAM used for caching documents for 
the index, if cache was filled.|
| |luceneRam|Estimated amount of RAM used by Lucene for the index.|

 
h1. Cluster Sizing

The cluster sizing action of the collection handler is intended to estimate 
resource usage for a complete Solr cluster. It is based on the Size Component 
and will perform calls to it internally in order to merge the results and 
compute aggregated estimations. It does not require any specific configuration, 
but requires that the SizeComponent is declared and used by the /select handler 
so that the ClusterSizing action can perform requests to it.

h2. Usage

The cluster sizing action can be accessed through the collections handler:

http://localhost:8983/solr/admin/collections?action=clustersizing

h3. Parameters

All parameters from the SizeComponent, except for size parameter itself, can be 
passed to the cluster sizing action and will be relayed to the SizeComponent 
when estimating resource usage. Below is the list of parameters specific to 
this action, for SizeComponent parameters, see the parameter table for it.

||name||type||default||description||
|collection|string|-|List of collections (CSV) to be included in the report, if 
not specified, all collections will be included.|
|shard|string|-|List of shards (CSV) to be included in the report, if not 
specified, all shards will be included.|
|replica|string|-|List of replicas (CSV) to be included in the report, if not 
specified, all replicas will be included.|

h3. Response

The response fields are the same as for SizeComponent, but will be grouped in 2 
ways. First, each node will have estimated total usage, then all collections 
will have details grouped by shards and then for each replica.

> Component for Solr resource usage planning
> ------------------------------------------
>
>                 Key: SOLR-8393
>                 URL: https://issues.apache.org/jira/browse/SOLR-8393
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Steve Molloy
>         Attachments: SOLR-8393.patch, SOLR-8393.patch, SOLR-8393.patch, 
> SOLR-8393.patch, SOLR-8393.patch, SOLR-8393.patch
>
>
> One question that keeps coming back is how much disk and RAM do I need to run 
> Solr. The most common response is that it highly depends on your data. While 
> true, it makes for frustrated users trying to plan their deployments. 
> The idea I'm bringing is to create a new component that will attempt to 
> extrapolate resources needed in the future by looking at resources currently 
> used. By adding a parameter for the target number of documents, current 
> resources are adapted by a ratio relative to current number of documents.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to