Thanks for your feedback.
I filed this Jira ticket
https://issues.apache.org/jira/browse/SOLR-16843
with mostly the text of my initial email.


Le ven. 9 juin 2023 à 15:26, David Smiley <dsmi...@apache.org> a écrit :

> TBH I don't imagine anyone resurrecting the Autoscaling framework; filing a
> JIRA issue against removed code seems like a waste of one's time.  But
> You/Pierre can file one.  It ought to be closed immediately as "Wont' Fix"
> (I chose this resolution status recently for a several now defunct issues).
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Wed, Jun 7, 2023 at 12:35 PM Jason Gerlowski <gerlowsk...@gmail.com>
> wrote:
>
> > Thanks for sharing Pierre!  It's clear you put some thought into the
> > clear writeup.
> >
> > I wonder whether this shouldn't go into a JIRA ticket so there's a
> > record of it that might be more visible to users?  Would you be
> > willing to summarize this in a JIRA ticket?  I'm not sure if anyone
> > will tackle the fix for 8x, but at least a ticket would document
> > things more visibly and it opens the door to someone else tackling the
> > problem down the road.
> >
> > Best,
> >
> > Jason
> >
> > On Thu, Jun 1, 2023 at 10:29 AM Pierre Salagnac
> > <pierre.salag...@gmail.com> wrote:
> > >
> > > I know the autoscaling framework does not exist anymore with Solr 9+,
> > but I
> > > wanted to share here a bug we found in it.
> > > Probably there are still plenty of Solr 8 users still relying on this
> > > framework.
> > >
> > >
> > > The triggers use timestamps returned by the JVM call System.nanoTime(),
> > but
> > > according to the Javadoc, this is NOT an absolute timestamp. This is
> > just a
> > > number relative to a random origin, and this origin will change each
> time
> > > the JVM is restarted.
> > >
> > > I figured out this impacts at least the following triggers (with
> > basically
> > > the same pattern),
> > > - IndexSizeTrigger
> > > - MetricTrigger
> > > - SearchRateTrigger
> > >
> > > These triggers want to fire an event when a certain condition
> (depending
> > on
> > > each trigger) is met for a certain period of time. They maintain a map
> > with
> > > [what, timestamp] entries to track a short term history, with the
> option
> > to
> > > remove an entry if the condition is not met anymore, so we don't
> trigger
> > > any event.
> > > Timestamps come from System.nanoTime(). So far so good as long as we
> > > compare these timestamps to each others in the same JVM. Now, this map
> is
> > > persisted in Zookeeper in case of an overseer change (written and read
> by
> > > TriggerBase.saveState() and restoreState() ). With an overseer change,
> > the
> > > nanoTime() origin is randomly moved to something else. Consequently,
> all
> > > the persisted timestamps from the previous overseer cannot be compared
> > with
> > > the current JVM "clock".
> > > This ends in triggers never being fired, or being fired without waiting
> > for
> > > the time configured.
> > >
> > > I found no Jira entry for this (but maybe there is one?), and I think
> > this
> > > could be a major contributor to the instability of this framework for
> > some
> > > environments.
> > > Also, I'm unsure whether it is still maintained in a 8x branch.
> > >
> > > Simple fix could be to always use TimeSource.getEpochTimeNs() instead
> > > of getTimeNs() in autoscaling code.
> > > But I'm not sure why we use nano seconds anyway. Seconds would be
> > > sufficient...
> > >
> > > Thanks,
> > > Pierre
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
> > For additional commands, e-mail: dev-h...@solr.apache.org
> >
> >
>

Reply via email to