TBH I don't imagine anyone resurrecting the Autoscaling framework; filing a
JIRA issue against removed code seems like a waste of one's time.  But
You/Pierre can file one.  It ought to be closed immediately as "Wont' Fix"
(I chose this resolution status recently for a several now defunct issues).

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Wed, Jun 7, 2023 at 12:35 PM Jason Gerlowski <gerlowsk...@gmail.com>
wrote:

> Thanks for sharing Pierre!  It's clear you put some thought into the
> clear writeup.
>
> I wonder whether this shouldn't go into a JIRA ticket so there's a
> record of it that might be more visible to users?  Would you be
> willing to summarize this in a JIRA ticket?  I'm not sure if anyone
> will tackle the fix for 8x, but at least a ticket would document
> things more visibly and it opens the door to someone else tackling the
> problem down the road.
>
> Best,
>
> Jason
>
> On Thu, Jun 1, 2023 at 10:29 AM Pierre Salagnac
> <pierre.salag...@gmail.com> wrote:
> >
> > I know the autoscaling framework does not exist anymore with Solr 9+,
> but I
> > wanted to share here a bug we found in it.
> > Probably there are still plenty of Solr 8 users still relying on this
> > framework.
> >
> >
> > The triggers use timestamps returned by the JVM call System.nanoTime(),
> but
> > according to the Javadoc, this is NOT an absolute timestamp. This is
> just a
> > number relative to a random origin, and this origin will change each time
> > the JVM is restarted.
> >
> > I figured out this impacts at least the following triggers (with
> basically
> > the same pattern),
> > - IndexSizeTrigger
> > - MetricTrigger
> > - SearchRateTrigger
> >
> > These triggers want to fire an event when a certain condition (depending
> on
> > each trigger) is met for a certain period of time. They maintain a map
> with
> > [what, timestamp] entries to track a short term history, with the option
> to
> > remove an entry if the condition is not met anymore, so we don't trigger
> > any event.
> > Timestamps come from System.nanoTime(). So far so good as long as we
> > compare these timestamps to each others in the same JVM. Now, this map is
> > persisted in Zookeeper in case of an overseer change (written and read by
> > TriggerBase.saveState() and restoreState() ). With an overseer change,
> the
> > nanoTime() origin is randomly moved to something else. Consequently, all
> > the persisted timestamps from the previous overseer cannot be compared
> with
> > the current JVM "clock".
> > This ends in triggers never being fired, or being fired without waiting
> for
> > the time configured.
> >
> > I found no Jira entry for this (but maybe there is one?), and I think
> this
> > could be a major contributor to the instability of this framework for
> some
> > environments.
> > Also, I'm unsure whether it is still maintained in a 8x branch.
> >
> > Simple fix could be to always use TimeSource.getEpochTimeNs() instead
> > of getTimeNs() in autoscaling code.
> > But I'm not sure why we use nano seconds anyway. Seconds would be
> > sufficient...
> >
> > Thanks,
> > Pierre
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
> For additional commands, e-mail: dev-h...@solr.apache.org
>
>

Reply via email to