vishal patel kirjoitti 10.7.2020 klo 12.45: > Thanks for your input. > > Walter already said that setting soft commit max time to 100 ms is a recipe > for disaster >>> I know that but our application is already developed and run on live >>> environment since last 5 years. Actually, we want to show a data very >>> quickly after the insert. > > you have huge JVM heaps without an explanation for the reason >>> We gave the 55GB ram because our usage is like that large query search and >>> very frequent searching and indexing. > Here is my memory snapshot which I have taken from GC.
Yes, I can see that a lot of memory is in use, but the question is why. I assume caches (are they too large?), perhaps uninverted indexes. Docvalues would help with latter ones. Do you use them? > I have tried Solr upgrade from 6.1.0 to 8.5.1 but due to some issue we cannot > do. I have also asked in here > https://lucene.472066.n3.nabble.com/Sorting-in-other-collection-in-Solr-8-5-1-td4459506.html#a4459562 You could also try upgrading to the latest version in 6.x series as a starter. > Why we cannot find the reason of recovery from log? like memory or CPU issue, > frequent index or search, large query hit, > My log at the time of recovery > https://drive.google.com/file/d/1F8Bn7jSXspe2HRelh_vJjKy9DsTRl9h0/view > [https://lh5.googleusercontent.com/htOUfpihpAqncFsMlCLnSUZPu1_9DRKGNajaXV1jG44fpFzgx51ecNtUK58m5lk=w1200-h630-p]<https://drive.google.com/file/d/1F8Bn7jSXspe2HRelh_vJjKy9DsTRl9h0/view> > recovery_shard.txt<https://drive.google.com/file/d/1F8Bn7jSXspe2HRelh_vJjKy9DsTRl9h0/view> > drive.google.com Isn't it right there on the first lines? 2020-07-09 14:42:43.943 ERROR (updateExecutor-2-thread-21007-processing-http:////11.200.212.305:8983//solr//products x:products r:core_node1 n:11.200.212.306:8983_solr s:shard1 c:products) [c:products s:shard1 r:core_node1 x:products] o.a.s.u.StreamingSolrClients error org.apache.http.NoHttpResponseException: 11.200.212.305:8983 failed to respond followed by a couple more error messages about the same problem and then initiation of recovery: 2020-07-09 14:42:44.002 INFO (qtp1239731077-771611) [c:products s:shard1 r:core_node1 x:products] o.a.s.c.ZkController Put replica core=products coreNodeName=core_node3 on 11.200.212.305:8983_solr into leader-initiated recovery. So the node in question isn't responding quickly enough to http requests and gets put into recovery. The log for the recovering node starts too late, so I can't say anything about what happened before 14:42:43.943 that lead to recovery. --Ere > > ________________________________ > From: Ere Maijala <ere.maij...@helsinki.fi> > Sent: Friday, July 10, 2020 2:10 PM > To: solr-user@lucene.apache.org <solr-user@lucene.apache.org> > Subject: Re: Replica goes into recovery mode in Solr 6.1.0 > > Walter already said that setting soft commit max time to 100 ms is a > recipe for disaster. That alone can be the issue, but if you're not > willing to try higher values, there's no way of being sure. And you have > huge JVM heaps without an explanation for the reason. If those do not > cause problems, you indicated that you also run some other software on > the same server. Is it possible that the other processes hog CPU, disk > or network and starve Solr? > > I must add that Solr 6.1.0 is over four years old. You could be hitting > a bug that has been fixed for years, but even if you encounter an issue > that's still present, you will need to uprgade to get it fixed. If you > look at the number of fixes done in subsequent 6.x versions alone in the > changelog (https://lucene.apache.org/solr/8_5_1/changes/Changes.html) > you'll see that there are a lot of them. You could be hitting something > like SOLR-10420, which has been fixed for over three years. > > Best, > Ere > > vishal patel kirjoitti 10.7.2020 klo 7.52: >> I’ve been running Solr for a dozen years and I’ve never needed a heap larger >> than 8 GB. >>>> What is your data size? same like us 1 TB? is your searching or indexing >>>> frequently? NRT model? >> >> My question is why replica is going into recovery? When replica went down, I >> checked GC log but GC pause was not more than 2 seconds. >> Also, I cannot find out any reason for recovery from Solr log file. i want >> to know the reason why replica goes into recovery. >> >> Regards, >> Vishal Patel >> ________________________________ >> From: Walter Underwood <wun...@wunderwood.org> >> Sent: Friday, July 10, 2020 3:03 AM >> To: solr-user@lucene.apache.org <solr-user@lucene.apache.org> >> Subject: Re: Replica goes into recovery mode in Solr 6.1.0 >> >> Those are extremely large JVMs. Unless you have proven that you MUST >> have 55 GB of heap, use a smaller heap. >> >> I’ve been running Solr for a dozen years and I’ve never needed a heap >> larger than 8 GB. >> >> Also, there is usually no need to use one JVM per replica. >> >> Your configuration is using 110 GB (two JVMs) just for Java >> where I would configure it with a single 8 GB JVM. That would >> free up 100 GB for file caches. >> >> wunder >> Walter Underwood >> wun...@wunderwood.org >> http://observer.wunderwood.org/ (my blog) >> >>> On Jul 8, 2020, at 10:10 PM, vishal patel <vishalpatel200...@outlook.com> >>> wrote: >>> >>> Thanks for reply. >>> >>> what you mean by "Shard1 Allocated memory” >>>>> It means JVM memory of one solr node or instance. >>> >>> How many Solr JVMs are you running? >>>>> In one server 2 solr JVMs in which one is shard and other is replica. >>> >>> What is the heap size for your JVMs? >>>>> 55GB of one Solr JVM. >>> >>> Regards, >>> Vishal Patel >>> >>> Sent from Outlook<http://aka.ms/weboutlook> >>> ________________________________ >>> From: Walter Underwood <wun...@wunderwood.org> >>> Sent: Wednesday, July 8, 2020 8:45 PM >>> To: solr-user@lucene.apache.org <solr-user@lucene.apache.org> >>> Subject: Re: Replica goes into recovery mode in Solr 6.1.0 >>> >>> I don’t understand what you mean by "Shard1 Allocated memory”. I don’t know >>> of >>> any way to dedicate system RAM to an application object like a replica. >>> >>> How many Solr JVMs are you running? >>> >>> What is the heap size for your JVMs? >>> >>> Setting soft commit max time to 100 ms does not magically make Solr super >>> fast. >>> It makes Solr do too much work, makes the work queues fill up, and makes it >>> fail. >>> >>> wunder >>> Walter Underwood >>> wun...@wunderwood.org >>> http://observer.wunderwood.org/ (my blog) >>> >>>> On Jul 7, 2020, at 10:55 PM, vishal patel <vishalpatel200...@outlook.com> >>>> wrote: >>>> >>>> Thanks for your reply. >>>> >>>> One server has total 320GB ram. In this 2 solr node one is shard1 and >>>> second is shard2 replica. Each solr node have 55GB memory allocated. >>>> shard1 has 585GB data and shard2 replica has 492GB data. means almost 1TB >>>> data in this server. server has also other applications and for that 60GB >>>> memory allocated. So total 150GB memory is left. >>>> >>>> Proper formatting details: >>>> https://drive.google.com/file/d/1K9JyvJ50Vele9pPJCiMwm25wV4A6x4eD/view >>>> >>>> Are you running multiple huge JVMs? >>>>>> Not huge but 60GB memory allocated for our 11 application. 150GB memory >>>>>> are still free. >>>> >>>> The servers will be doing a LOT of disk IO, so look at the read and write >>>> iops. I expect that the solr processes are blocked on disk reads almost >>>> all the time. >>>>>> is it chance to go in recovery mode if more IO read and write or blocked? >>>> >>>> "-Dsolr.autoSoftCommit.maxTime=100” is way too short (100 ms). >>>>>> Our requirement is NRT so we keep the less time >>>> >>>> Regards, >>>> Vishal Patel >>>> ________________________________ >>>> From: Walter Underwood <wun...@wunderwood.org> >>>> Sent: Tuesday, July 7, 2020 8:15 PM >>>> To: solr-user@lucene.apache.org <solr-user@lucene.apache.org> >>>> Subject: Re: Replica goes into recovery mode in Solr 6.1.0 >>>> >>>> This isn’t a support list, so nobody looks at issues. We do try to help. >>>> >>>> It looks like you have 1 TB of index on a system with 320 GB of RAM. >>>> I don’t know what "Shard1 Allocated memory” is, but maybe half of >>>> that RAM is used by JVMs or some other process, I guess. Are you >>>> running multiple huge JVMs? >>>> >>>> The servers will be doing a LOT of disk IO, so look at the read and >>>> write iops. I expect that the solr processes are blocked on disk reads >>>> almost all the time. >>>> >>>> "-Dsolr.autoSoftCommit.maxTime=100” is way too short (100 ms). >>>> That is probably causing your outages. >>>> >>>> wunder >>>> Walter Underwood >>>> wun...@wunderwood.org >>>> http://observer.wunderwood.org/ (my blog) >>>> >>>>> On Jul 7, 2020, at 5:18 AM, vishal patel <vishalpatel200...@outlook.com> >>>>> wrote: >>>>> >>>>> Any one is looking my issue? Please guide me. >>>>> >>>>> Regards, >>>>> Vishal Patel >>>>> >>>>> >>>>> ________________________________ >>>>> From: vishal patel <vishalpatel200...@outlook.com> >>>>> Sent: Monday, July 6, 2020 7:11 PM >>>>> To: solr-user@lucene.apache.org <solr-user@lucene.apache.org> >>>>> Subject: Replica goes into recovery mode in Solr 6.1.0 >>>>> >>>>> I am using Solr version 6.1.0, Java 8 version and G1GC on production. We >>>>> have 2 shards and each shard has 1 replica. We have 3 collection. >>>>> We do not use any cache and also disable in Solr config.xml. Search and >>>>> Update requests are coming frequently in our live platform. >>>>> >>>>> *Our commit configuration in solr.config are below >>>>> <autoCommit> >>>>> <maxTime>600000</maxTime> >>>>> <maxDocs>20000</maxDocs> >>>>> <openSearcher>false</openSearcher> >>>>> </autoCommit> >>>>> <autoSoftCommit> >>>>> <maxTime>${solr.autoSoftCommit.maxTime:-1}</maxTime> >>>>> </autoSoftCommit> >>>>> >>>>> *We used Near Real Time Searching So we did below configuration in >>>>> solr.in.cmd >>>>> set SOLR_OPTS=%SOLR_OPTS% -Dsolr.autoSoftCommit.maxTime=100 >>>>> >>>>> *Our collections details are below: >>>>> >>>>> Collection Shard1 Shard1 Replica Shard2 Shard2 Replica >>>>> Number of Documents Size(GB) Number of Documents Size(GB) >>>>> Number of Documents Size(GB) Number of Documents >>>>> Size(GB) >>>>> collection1 26913364 201 26913379 202 26913380 >>>>> 198 26913379 198 >>>>> collection2 13934360 310 13934367 310 13934368 >>>>> 219 13934367 219 >>>>> collection3 351539689 73.5 351540040 73.5 351540136 >>>>> 75.2 351539722 75.2 >>>>> >>>>> *My server configurations are below: >>>>> >>>>> Server1 Server2 >>>>> CPU Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz, 2301 Mhz, 10 Core(s), >>>>> 20 Logical Processor(s) Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz, >>>>> 2301 Mhz, 10 Core(s), 20 Logical Processor(s) >>>>> HardDisk(GB) 3845 ( 3.84 TB) 3485 GB (3.48 TB) >>>>> Total memory(GB) 320 320 >>>>> Shard1 Allocated memory(GB) 55 >>>>> Shard2 Replica Allocated memory(GB) 55 >>>>> Shard2 Allocated memory(GB) 55 >>>>> Shard1 Replica Allocated memory(GB) 55 >>>>> Other Applications Allocated Memory(GB) 60 22 >>>>> Other Number Of Applications 11 7 >>>>> >>>>> >>>>> Sometimes, any one replica goes into recovery mode. Why replica goes into >>>>> recovery? Due to heavy search OR heavy update/insert OR long GC pause >>>>> time? If any one of them then what should we do in configuration? >>>>> Should we increase the shard for recovery issue? >>>>> >>>>> Regards, >>>>> Vishal Patel >>>>> >>>> >>> >> >> > > -- > Ere Maijala > Kansalliskirjasto / The National Library of Finland > -- Ere Maijala Kansalliskirjasto / The National Library of Finland