Thanks for your feedback. I filed this Jira ticket https://issues.apache.org/jira/browse/SOLR-16843 with mostly the text of my initial email.
Le ven. 9 juin 2023 à 15:26, David Smiley <dsmi...@apache.org> a écrit : > TBH I don't imagine anyone resurrecting the Autoscaling framework; filing a > JIRA issue against removed code seems like a waste of one's time. But > You/Pierre can file one. It ought to be closed immediately as "Wont' Fix" > (I chose this resolution status recently for a several now defunct issues). > > ~ David Smiley > Apache Lucene/Solr Search Developer > http://www.linkedin.com/in/davidwsmiley > > > On Wed, Jun 7, 2023 at 12:35 PM Jason Gerlowski <gerlowsk...@gmail.com> > wrote: > > > Thanks for sharing Pierre! It's clear you put some thought into the > > clear writeup. > > > > I wonder whether this shouldn't go into a JIRA ticket so there's a > > record of it that might be more visible to users? Would you be > > willing to summarize this in a JIRA ticket? I'm not sure if anyone > > will tackle the fix for 8x, but at least a ticket would document > > things more visibly and it opens the door to someone else tackling the > > problem down the road. > > > > Best, > > > > Jason > > > > On Thu, Jun 1, 2023 at 10:29 AM Pierre Salagnac > > <pierre.salag...@gmail.com> wrote: > > > > > > I know the autoscaling framework does not exist anymore with Solr 9+, > > but I > > > wanted to share here a bug we found in it. > > > Probably there are still plenty of Solr 8 users still relying on this > > > framework. > > > > > > > > > The triggers use timestamps returned by the JVM call System.nanoTime(), > > but > > > according to the Javadoc, this is NOT an absolute timestamp. This is > > just a > > > number relative to a random origin, and this origin will change each > time > > > the JVM is restarted. > > > > > > I figured out this impacts at least the following triggers (with > > basically > > > the same pattern), > > > - IndexSizeTrigger > > > - MetricTrigger > > > - SearchRateTrigger > > > > > > These triggers want to fire an event when a certain condition > (depending > > on > > > each trigger) is met for a certain period of time. They maintain a map > > with > > > [what, timestamp] entries to track a short term history, with the > option > > to > > > remove an entry if the condition is not met anymore, so we don't > trigger > > > any event. > > > Timestamps come from System.nanoTime(). So far so good as long as we > > > compare these timestamps to each others in the same JVM. Now, this map > is > > > persisted in Zookeeper in case of an overseer change (written and read > by > > > TriggerBase.saveState() and restoreState() ). With an overseer change, > > the > > > nanoTime() origin is randomly moved to something else. Consequently, > all > > > the persisted timestamps from the previous overseer cannot be compared > > with > > > the current JVM "clock". > > > This ends in triggers never being fired, or being fired without waiting > > for > > > the time configured. > > > > > > I found no Jira entry for this (but maybe there is one?), and I think > > this > > > could be a major contributor to the instability of this framework for > > some > > > environments. > > > Also, I'm unsure whether it is still maintained in a 8x branch. > > > > > > Simple fix could be to always use TimeSource.getEpochTimeNs() instead > > > of getTimeNs() in autoscaling code. > > > But I'm not sure why we use nano seconds anyway. Seconds would be > > > sufficient... > > > > > > Thanks, > > > Pierre > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org > > For additional commands, e-mail: dev-h...@solr.apache.org > > > > >