Hi Marc,

Thank you for taking a look at this please see my replies below.

> > I think, given that on other platforms sched_clock() is already used
> > early, it is not a good idea to invent a different clock just for time
> > stamps.
>
> Square pegs vs round holes. Mimicking other architectures isn't always
> the right thing to do when faced with a different problem. We put a
> lot of effort in working around timer errata for a good reason, and
> feeding the rest of the system bogus timing information doesn't sound
> great.
>
> > We could limit arm64 approach only for chips where cntvct_el0 is
> > working: i.e. frequency is known, and the clock is stable, meaning
> > cannot go backward. Perhaps we would start early clock a little later,
> > but at least it will be available for the sane chips. The only
> > question, where during boot time this is known.
>
> How do you propose we do that? Defective timers can be a property of
> the implementation, of the integration, or both. In any case, it
> requires firmware support (DT, ACPI). All that is only available quite
> late, and moving it earlier is not easily doable.

OK, but could we at least whitelist something early with expectation
that the future chips won't be bogus?

> > Another approach is to modify sched_clock() in
> > kernel/time/sched_clock.c to never return backward value during boot.
> >
> > 1. Rename  current implementation of sched_clock() to sched_clock_raw()
> > 2. New sched_clock() would look like this:
> >
> > u64 sched_clock(void)
> > {
> >    if (static_branch(early_unstable_clock))
> >       return sched_clock_unstable();
> >    else
> >       return sched_clock_raw();
> > }
> >
> > 3. sched_clock_unstable() would look like this:
> >
> > u64 sched_clock_unstable(void)
> > {
> > again:
> >   static u64 old_clock;
> >   u64 new_clock = sched_clock_raw();
> >   static u64 old_clock_read =   READ_ONCE(old_clock);
> >   /* It is ok if time does not progress, but don't allow to go backward */
> >   if (new_clock < old_clock_read)
> >     return old_clock_read;
> >    /* update the old_clock value */
> >    if (cmpxchg64(&old_clock, old_clock_read, new_clock) != old_clock_read)
> >       goto again;
> >    return new_clock;
> > }
>
> You now have an "unstable" clock that is only allowed to move forward,
> until you switch to the real one. And at handover time, anything can
> happen.
>
> It is one thing to allow for the time stamping to be imprecise. But
> imposing the same behaviour on other parts of the kernel that have so
> far relied on a strictly monotonic sched_clock feels like a bad idea.

sched_clock() will still be strictly monotonic. During switch over we
will guarantee to continue from where the early clock left.

>
> What I'm proposing is that we allow architectures to override the hard
> tie between local_clock/sched_clock and kernel log time stamping, with
> the default being of course what we have today. This gives a clean
> separation between the two when the architecture needs to delay the
> availability of sched_clock until implementation requirements are
> discovered. It also keep sched_clock simple and efficient.
>
> To illustrate what I'm trying to argue for, I've pushed out a couple
> of proof of concept patches here[1]. I've briefly tested them in a
> guest, and things seem to work OK.

What I am worried is that decoupling time stamps from the
sched_clock() will cause uptime and other commands that show boot time
not to correlate with timestamps in dmesg with these changes. For them
to correlate we would still have to have a switch back to
local_clock() in timestamp_clock() after we are done with early boot,
which brings us back to using a temporarily unstable clock that I
proposed above but without adding an architectural hook for it. Again,
we would need to solve the problem of time continuity during switch
over, which is not a hard problem to solve, as we do it already in
sched_clock.c, and everytime clocksource changes.

During early boot time stamps project for x86 we were extra careful to
make sure that they stay the same.

Thank you,
Pasha

Reply via email to