[
https://issues.apache.org/jira/browse/SOLR-12470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16571933#comment-16571933
]
Andrzej Bialecki commented on SOLR-12470:
------------------------------------------
Another issue with this testing method: since you're using a simple {{curl}}
command accessing always a single specific URL (as opposed to
{{CloudSolrClient}} or another load-balancing client) then the search rate for
this particular replica, as measured by
{{QUERY./select.requestTimes:1minRate}}, will always stay constant no matter
how many other replicas are added.
This means that the trigger will endlessly keep adding new replicas, because
even after they are added the search rate never decreases... This is easy to
reproduce by keeping the requestor process running for a long time - I stopped
at 40 replicas ;)
I need to investigate how to fix the trigger to prevent situations like this.
> Search Rate Trigger created more than 3 replicas
> ------------------------------------------------
>
> Key: SOLR-12470
> URL: https://issues.apache.org/jira/browse/SOLR-12470
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Components: AutoScaling
> Reporter: Varun Thacker
> Assignee: Andrzej Bialecki
> Priority: Major
> Fix For: 7.5
>
> Attachments: 4_docs.json, bug_report.txt, graph_view.png,
> multiple_replicas.zip, system_docs.json
>
>
> Here's the trigger that I created . At this point the collection was one
> shard and one replica ( on node3 )
> {code:java}
> curl -X POST -H 'Content-type:application/json' --data-binary '{
> "set-trigger": {
> "name" : "search_rate_trigger",
> "event" : "searchRate",
> "collection" : "test_rate_trigger",
> "rate" : 1.0,
> "waitFor" : "1m",
> "enabled" : true,
> "actions" : [
> {
> "name" : "compute_plan",
> "class": "solr.ComputePlanAction"
> },
> {
> "name" : "execute_plan",
> "class": "solr.ExecutePlanAction"
> }
> ]
> }
> }' http://localhost:8983/solr/admin/autoscaling{code}
>
> I also had a trigger listener setup as I was testing the listener feature
> {code:java}
> curl -X POST -H 'Content-type:application/json' --data-binary '{
> "set-listener": {
> "name": "search_rate_listener",
> "trigger": "search_rate_trigger",
> "stage": ["STARTED", "ABORTED", "SUCCEEDED", "FAILED"],
> "class": "solr.SystemLogListener"
> }
> }' http://localhost:8983/solr/admin/autoscaling{code}
>
> I ran a script to fire queries at every 100ms . The index didn't have any
> docs so it's a simple match all query
> {code:java}
> while [ 1 ]
> do
> curl -s "http://localhost:8984/solr/test_rate_trigger/select/?q=*:*" >
> /dev/null
> sleep .1
> done{code}
> After a few minutes I see 4 replicas being created.
> Attaching logs from all 4 nodes. It should be fairly easy to reproduce with
> the above mentioned steps
> Also attaching all the docs from the .system collection for reference
> Here's another interesting this I noticed. I re-created the setup but this
> time removed the execute_plan part
> Now every 1 min the compute plan action tries to create 3 replicas . Why I
> found this interesting is that it was trying to create two replicas on the
> same node
> Does this look like a separate bug?
> {code:java}
> INFO - 2018-06-08 03:41:32.586; [ ] org.apache.solr.servlet.HttpSolrCall;
> [admin] webapp=null path=/admin/metrics
> params={wt=javabin&version=2&key=solr.jvm:os.processCpuLoad&key=solr.node:CONTAINER.fs.coreRoot.usableSpace&key=solr.jvm:os.systemLoadAverage&key=solr.jvm:memory.heap.used}
> status=0 QTime=0
> INFO - 2018-06-08 03:41:40.909; [ ] org.apache.solr.servlet.HttpSolrCall;
> [admin] webapp=null path=/admin/metrics
> params={prefix=CONTAINER.fs.usableSpace,CORE.coreName&wt=javabin&version=2&group=solr.node,solr.core}
> status=0 QTime=1
> INFO - 2018-06-08 03:41:40.932; [ ]
> org.apache.solr.cloud.autoscaling.ComputePlanAction; Computed Plan:
> action=ADDREPLICA&collection=test_rate_trigger&shard=shard1&node=127.94.0.1:8984_solr&type=NRT
> INFO - 2018-06-08 03:41:40.933; [ ]
> org.apache.solr.cloud.autoscaling.ComputePlanAction; Computed Plan:
> action=ADDREPLICA&collection=test_rate_trigger&shard=shard1&node=127.94.0.1:8983_solr&type=NRT
> INFO - 2018-06-08 03:41:40.934; [ ]
> org.apache.solr.cloud.autoscaling.ComputePlanAction; Computed Plan:
> action=ADDREPLICA&collection=test_rate_trigger&shard=shard1&node=127.94.0.1:8983_solr&type=NRT
> INFO - 2018-06-08 03:41:40.934; [ ]
> org.apache.solr.client.solrj.cloud.autoscaling.PolicyHelper; returnSession,
> curr-time 9184331 sessionWrapper.createTime 9184324085271,
> this.sessionWrapper.createTime 9184324085271
> INFO - 2018-06-08 03:42:32.604; [ ] org.apache.solr.servlet.HttpSolrCall;
> [admin] webapp=null path=/admin/metrics
> params={wt=javabin&version=2&key=solr.jvm:os.processCpuLoad&key=solr.node:CONTAINER.fs.coreRoot.usableSpace&key=solr.jvm:os.systemLoadAverage&key=solr.jvm:memory.heap.used}
> status=0 QTime=0
> INFO - 2018-06-08 03:42:41.525; [ ] org.apache.solr.servlet.HttpSolrCall;
> [admin] webapp=null path=/admin/metrics
> params={prefix=CONTAINER.fs.usableSpace,CORE.coreName&wt=javabin&version=2&group=solr.node,solr.core}
> status=0 QTime=0
> INFO - 2018-06-08 03:42:41.559; [ ]
> org.apache.solr.cloud.autoscaling.ComputePlanAction; Computed Plan:
> action=ADDREPLICA&collection=test_rate_trigger&shard=shard1&node=127.94.0.1:8984_solr&type=NRT
> INFO - 2018-06-08 03:42:41.560; [ ]
> org.apache.solr.cloud.autoscaling.ComputePlanAction; Computed Plan:
> action=ADDREPLICA&collection=test_rate_trigger&shard=shard1&node=127.94.0.1:8983_solr&type=NRT
> INFO - 2018-06-08 03:42:41.560; [ ]
> org.apache.solr.cloud.autoscaling.ComputePlanAction; Computed Plan:
> action=ADDREPLICA&collection=test_rate_trigger&shard=shard1&node=127.94.0.1:8983_solr&type=NRT
> INFO - 2018-06-08 03:42:41.561; [ ]
> org.apache.solr.client.solrj.cloud.autoscaling.PolicyHelper; returnSession,
> curr-time 9244959 sessionWrapper.createTime 9244956725861,
> this.sessionWrapper.createTime 9244956725861
> INFO - 2018-06-08 03:43:32.622; [ ] org.apache.solr.servlet.HttpSolrCall;
> [admin] webapp=null path=/admin/metrics
> params={wt=javabin&version=2&key=solr.jvm:os.processCpuLoad&key=solr.node:CONTAINER.fs.coreRoot.usableSpace&key=solr.jvm:os.systemLoadAverage&key=solr.jvm:memory.heap.used}
> status=0 QTime=0
> INFO - 2018-06-08 03:43:42.158; [ ] org.apache.solr.servlet.HttpSolrCall;
> [admin] webapp=null path=/admin/metrics
> params={prefix=CONTAINER.fs.usableSpace,CORE.coreName&wt=javabin&version=2&group=solr.node,solr.core}
> status=0 QTime=1
> INFO - 2018-06-08 03:43:42.178; [ ]
> org.apache.solr.cloud.autoscaling.ComputePlanAction; Computed Plan:
> action=ADDREPLICA&collection=test_rate_trigger&shard=shard1&node=127.94.0.1:8984_solr&type=NRT
> INFO - 2018-06-08 03:43:42.180; [ ]
> org.apache.solr.cloud.autoscaling.ComputePlanAction; Computed Plan:
> action=ADDREPLICA&collection=test_rate_trigger&shard=shard1&node=127.94.0.1:8983_solr&type=NRT
> INFO - 2018-06-08 03:43:42.181; [ ]
> org.apache.solr.cloud.autoscaling.ComputePlanAction; Computed Plan:
> action=ADDREPLICA&collection=test_rate_trigger&shard=shard1&node=127.94.0.1:8983_solr&type=NRT
> INFO - 2018-06-08 03:43:42.181; [ ]
> org.apache.solr.client.solrj.cloud.autoscaling.PolicyHelper; returnSession,
> curr-time 9305581 sessionWrapper.createTime 9305577119413,
> this.sessionWrapper.createTime 9305577119413
> INFO - 2018-06-08 03:44:32.642; [ ] org.apache.solr.servlet.HttpSolrCall;
> [admin] webapp=null path=/admin/metrics
> params={wt=javabin&version=2&key=solr.jvm:os.processCpuLoad&key=solr.node:CONTAINER.fs.coreRoot.usableSpace&key=solr.jvm:os.systemLoadAverage&key=solr.jvm:memory.heap.used}
> status=0 QTime=0
> INFO - 2018-06-08 03:44:42.759; [ ] org.apache.solr.servlet.HttpSolrCall;
> [admin] webapp=null path=/admin/metrics
> params={prefix=CONTAINER.fs.usableSpace,CORE.coreName&wt=javabin&version=2&group=solr.node,solr.core}
> status=0 QTime=0
> INFO - 2018-06-08 03:44:42.778; [ ]
> org.apache.solr.cloud.autoscaling.ComputePlanAction; Computed Plan:
> action=ADDREPLICA&collection=test_rate_trigger&shard=shard1&node=127.94.0.1:8984_solr&type=NRT
> INFO - 2018-06-08 03:44:42.779; [ ]
> org.apache.solr.cloud.autoscaling.ComputePlanAction; Computed Plan:
> action=ADDREPLICA&collection=test_rate_trigger&shard=shard1&node=127.94.0.1:8983_solr&type=NRT
> INFO - 2018-06-08 03:44:42.779; [ ]
> org.apache.solr.cloud.autoscaling.ComputePlanAction; Computed Plan:
> action=ADDREPLICA&collection=test_rate_trigger&shard=shard1&node=127.94.0.1:8983_solr&type=NRT
> INFO - 2018-06-08 03:44:42.779; [ ]
> org.apache.solr.client.solrj.cloud.autoscaling.PolicyHelper; returnSession,
> curr-time 9366182 sessionWrapper.createTime 9366178796748,
> this.sessionWrapper.createTime 9366178796748{code}
>
> Thirdly, for this above mentioned test I started observing the .system
> collection . Here is a query that I used to get the documents created from
> the first time the listener kicked in
> {code:java}
> http://localhost:8983/solr/.system/select?fq=source_s:SystemLogListener&q=*:*&rows=4&sort=timestamp%20asc{code}
> My first expectation was that I'd see 3 docs but I saw 4 docs. Curious why
> it's 4 ( the docs are attached as 4_docs.json )
> My intention here is to remove the system log listener with an http listener
> here I wanted to understand should I be looking out for 4 events or 3
> The first reaction here is it's a minor bug hence I'm putting it as part of
> this jira
> Happy to break it up into smaller Jiras once I hear back if these are valid
> issues. This test was run against master
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]