[ 
https://issues.apache.org/jira/browse/SOLR-7121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sachin Goyal updated SOLR-7121:
-------------------------------
    Attachment: SOLR-7121.patch

[~elyograg], here is another patch which removes System.currentTimeMillis().

Most of the important values are already in the configuration and turned off by 
default.
{code:xml}
  <coreDownThresholds name="thresholds1">

    <bool name="goDownIfHighLoad">false</bool>

    <str name="coreNameExpression">abc.*</str>

    <int name="coreLimitMaxThreads">45</int>

    <int name="coreLimitMaxGcMillis">10000</int>

    <!-- These 3 options must be specified together and are used as an AND 
condition -->
    <int name="coreLimitMaxLongQueries">100</int>
    <int name="coreLimitLongQueryTime">100</int>
    <int name="coreLimitMaxLongQueriesInterval">1000</int>

    <!-- These 2 options must be specified together and are used as an AND 
condition -->
    <int name="coreLimitMax95thPcSelectTime">-1</int>
    <int name="coreLimitMax5MinSelectRate">-1</int>
  </coreDownThresholds>
{code}
Very few options are hard-coded values as I felt it would be best to leave 
those out of configuration. Will wait for this patch's complete review comments 
before converting them to configuration as well.

> Solr nodes should go down based on configurable thresholds and not rely on 
> resource exhaustion
> ----------------------------------------------------------------------------------------------
>
>                 Key: SOLR-7121
>                 URL: https://issues.apache.org/jira/browse/SOLR-7121
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Sachin Goyal
>         Attachments: SOLR-7121.patch, SOLR-7121.patch
>
>
> Currently, there is no way to control when a Solr node goes down.
> If the server is having high GC pauses or too many threads or is just getting 
> too many queries due to some bad load-balancer, the cores in the machine keep 
> on serving unless they exhaust the machine's resources and everything comes 
> to a stall.
> Such a slow-dying core can affect other cores as well by taking huge time to 
> serve their distributed queries.
> There should be a way to specify some threshold values beyond which the 
> targeted core can its ill-health and proactively go down to recover.
> When the load improves, the core should come up automatically.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to