* Daniel Walker ([EMAIL PROTECTED]) wrote: > On Thu, 2007-07-19 at 11:54 +0200, Andi Kleen wrote: > > Move it into an own file for easy sharing. > > Do everything per CPU. This avoids problems with TSCs that > > tick at different frequencies per CPU. > > Resync properly on cpufreq changes. CPU frequency is instable > > around cpu frequency changing, so fall back during a backing > > clock during this period. > > Hopefully TSC will work now on all systems except when there isn't a > > physical TSC. > > > > And > > > > +From: Jeremy Fitzhardinge <[EMAIL PROTECTED]> > > Three cleanups there: > > - change "instable" -> "unstable" > > - it's better to use get_cpu_var for getting this cpu's variables > > - change cycles_2_ns to do the full computation rather than just the > > tsc->ns scaling. It's a simpler interface, and it makes the function .... > > +/* > > + * Scheduler clock - returns current time in nanosec units. > > + * All data is local to the CPU. > > + * The values are approximately[1] monotonic local to a CPU, but not > > + * between CPUs. There might be also an occasionally random error, > > + * but not too bad. Between CPUs the values can be non monotonic. > > + * > > + * [1] no attempt to stop CPU instruction reordering, which can hit > > + * in a 100 instruction window or so. > > + * > > + * The clock can be in two states: stable and unstable. > > + * When it is stable we use the TSC per CPU. > > + * When it is unstable we use jiffies as fallback. > > + * stable->unstable->stable transitions can happen regularly > > + * during CPU frequency changes. > > + * There is special code to avoid having the clock jump backwards > > + * when we switch from TSC to jiffies, which needs to keep some state > > + * per CPU. This state is protected against parallel state changes > > + * with interrupts off. > The comment still says something about interrupts off, but that was > removed it looks like. >
I noticed the same thing about interrupts off when going through the code. Andi, since you are already playing with per cpu variables, you could leverage asm/local.h there by declaring last_val as local_t and use either local_cmpxchg or local_add_return (depending on your needs) to get both better performances than cli/sti _and_ be really atomic. See this thread for performance tests: http://www.ussg.iu.edu/hypermail/linux/kernel/0707.1/0832.html Mathieu > > + */ > > +unsigned long long tsc_sched_clock(void) > > +{ > > + unsigned long long r; > > + struct sc_data *sc = &get_cpu_var(sc_data); > > + > > + if (unlikely(sc->unstable)) { > > + r = (jiffies_64 - sc->sync_base) * (1000000000 / HZ); > > + r += sc->ns_base; > > Looking further down you aren't using this unstable path when the tsc is > just outright unstable (i.e. some Cyrix systems IIRC)? An improvement > over the original code would be to catch the systems that change > frequencies without cpufreq (like the ones that gave Thomas so much > trouble). > > > + /* > > + * last_val is used to avoid non monotonity on a > > + * stable->unstable transition. Make sure the time > > + * never goes to before the last value returned by the > > + * TSC clock. > > + */ > > + while (r <= sc->last_val) { > > + rmb(); > > + r = sc->last_val + 1; > > + rmb(); > > + } > > + sc->last_val = r; > > + } else { > > + rdtscll(r); > > + r = __cycles_2_ns(sc, r); > > + sc->last_val = r; > > + } > > + > > + put_cpu_var(sc_data); > > + > > + return r; > > +} > > + > > +/* We need to define a real function for sched_clock, to override the > > + weak default version */ > > +#ifdef CONFIG_PARAVIRT > > +unsigned long long sched_clock(void) > > +{ > > + return paravirt_sched_clock(); > > +} > > +#else > > +unsigned long long sched_clock(void) > > + __attribute__((alias("tsc_sched_clock"))); > > +#endif > > + > > +static int no_sc_for_printk; > > + > > +/* > > + * printk clock: when it is known the sc results are very non monotonic > > + * fall back to jiffies for printk. Other sched_clock users are supposed > > + * to handle this. > > + */ > > +unsigned long long printk_clock(void) > > +{ > > + if (unlikely(no_sc_for_printk)) > > + return (jiffies_64 - INITIAL_JIFFIES) * (1000000000 / HZ); > > + return tsc_sched_clock(); > > +} > > + > > +static void resolve_freq(struct cpufreq_freqs *freq) > > +{ > > + if (!freq->new) { > > + freq->new = cpufreq_get(freq->cpu); > > + if (!freq->new) > > + freq->new = tsc_khz; > > + } > > +} > > + > > +/* Resync with new CPU frequency. Must run on to be synced CPU */ > > +static void resync_freq(void *arg) > > +{ > > + struct cpufreq_freqs *freq = (void *)arg; > > + struct sc_data *sc = &__get_cpu_var(sc_data); > > + > > + sc->sync_base = jiffies; > > + if (!cpu_has_tsc) { > > + sc->unstable = 1; > > + return; > > + } > > + resolve_freq(freq); > > + > > + /* > > + * Handle nesting, but when we're zero multiple calls in a row > > + * are ok too and not a bug. This can happen during startup > > + * when the different callbacks race with each other. > > + */ > > + if (sc->unstable > 0) > > + sc->unstable--; > > + if (sc->unstable) > > + return; > > + > > + /* Minor race window here, but should not add significant errors. */ > > + sc->ns_base = ktime_to_ns(ktime_get()); > > + rdtscll(sc->sync_base); > > + sc->cyc2ns_scale = (1000000 << CYC2NS_SCALE_FACTOR) / freq->new; > > +} > > + > > +static void resync_freq_on_cpu(void *arg) > > +{ > > + struct cpufreq_freqs f = { .new = 0 }; > > + > > + f.cpu = get_cpu(); > > + resync_freq(&f); > > + put_cpu(); > > +} > > + > > +static int sc_freq_event(struct notifier_block *nb, unsigned long event, > > + void *data) > > +{ > > + struct cpufreq_freqs *freq = data; > > + struct sc_data *sc = &per_cpu(sc_data, freq->cpu); > > + > > + if (cpu_has(&cpu_data[freq->cpu], X86_FEATURE_CONSTANT_TSC)) > > + return NOTIFY_DONE; > > + if (freq->old == freq->new) > > + return NOTIFY_DONE; > > + > > + switch (event) { > > + case CPUFREQ_SUSPENDCHANGE: > > + /* Mark TSC unstable during suspend/resume */ > > + case CPUFREQ_PRECHANGE: > > + /* > > + * Mark TSC as unstable until cpu frequency change is > > + * done because we don't know when exactly it will > > + * change. unstable in used as a counter to guard > > + * against races between the cpu frequency notifiers > > + * and normal resyncs > > + */ > > + sc->unstable++; > > + /* FALL THROUGH */ > > + case CPUFREQ_RESUMECHANGE: > > + case CPUFREQ_POSTCHANGE: > > + /* > > + * Frequency change or resume is done -- update everything and > > + * mark TSC as stable again. > > + */ > > + on_cpu_single(freq->cpu, resync_freq, freq); > > + break; > > + } > > + return NOTIFY_DONE; > > +} > > + > > +static struct notifier_block sc_freq_notifier = { > > + .notifier_call = sc_freq_event > > +}; > > + > > +static int __cpuinit > > +sc_cpu_event(struct notifier_block *self, unsigned long event, void *hcpu) > > +{ > > + long cpu = (long)hcpu; > > + if (event == CPU_ONLINE) { > > + struct cpufreq_freqs f = { .cpu = cpu, .new = 0 }; > > + > > + on_cpu_single(cpu, resync_freq, &f); > > + } > > + return NOTIFY_DONE; > > +} > > + > > +static __init int init_sched_clock(void) > > +{ > > + if (unsynchronized_tsc()) > > + no_sc_for_printk = 1; > > + > > + /* > > + * On a race between the various events the initialization > > + * might be done multiple times, but code is tolerant to > > + * this . > > + */ > > + cpufreq_register_notifier(&sc_freq_notifier, > > + CPUFREQ_TRANSITION_NOTIFIER); > > + hotcpu_notifier(sc_cpu_event, 0); > > + on_each_cpu(resync_freq_on_cpu, NULL, 0, 0); > > + return 0; > > +} > > +core_initcall(init_sched_clock); > > Index: linux/arch/i386/kernel/tsc.c > > =================================================================== > > --- linux.orig/arch/i386/kernel/tsc.c > > +++ linux/arch/i386/kernel/tsc.c > > @@ -63,74 +63,6 @@ static inline int check_tsc_unstable(voi > > return tsc_unstable; > > } > > > > -/* Accellerators for sched_clock() > > - * convert from cycles(64bits) => nanoseconds (64bits) > > - * basic equation: > > - * ns = cycles / (freq / ns_per_sec) > > - * ns = cycles * (ns_per_sec / freq) > > - * ns = cycles * (10^9 / (cpu_khz * 10^3)) > > - * ns = cycles * (10^6 / cpu_khz) > > - * > > - * Then we use scaling math (suggested by [EMAIL PROTECTED]) to get: > > - * ns = cycles * (10^6 * SC / cpu_khz) / SC > > - * ns = cycles * cyc2ns_scale / SC > > - * > > - * And since SC is a constant power of two, we can convert the div > > - * into a shift. > > - * > > - * We can use khz divisor instead of mhz to keep a better percision, since > > - * cyc2ns_scale is limited to 10^6 * 2^10, which fits in 32 bits. > > - * ([EMAIL PROTECTED]) > > - * > > - * [EMAIL PROTECTED] "math is hard, lets go shopping!" > > - */ > > -unsigned long cyc2ns_scale __read_mostly; > > - > > -#define CYC2NS_SCALE_FACTOR 10 /* 2^10, carefully chosen */ > > - > > -static inline void set_cyc2ns_scale(unsigned long cpu_khz) > > -{ > > - cyc2ns_scale = (1000000 << CYC2NS_SCALE_FACTOR)/cpu_khz; > > -} > > - > > -/* > > - * Scheduler clock - returns current time in nanosec units. > > - */ > > -unsigned long long native_sched_clock(void) > > -{ > > - unsigned long long this_offset; > > - > > - /* > > - * Fall back to jiffies if there's no TSC available: > > - * ( But note that we still use it if the TSC is marked > > - * unstable. We do this because unlike Time Of Day, > > - * the scheduler clock tolerates small errors and it's > > - * very important for it to be as fast as the platform > > - * can achive it. ) > > - */ > > - if (unlikely(!tsc_enabled && !tsc_unstable)) > > - /* No locking but a rare wrong value is not a big deal: */ > > - return (jiffies_64 - INITIAL_JIFFIES) * (1000000000 / HZ); > > - > > - /* read the Time Stamp Counter: */ > > - rdtscll(this_offset); > > - > > - /* return the value in ns */ > > - return cycles_2_ns(this_offset); > > -} > > - > > -/* We need to define a real function for sched_clock, to override the > > - weak default version */ > > -#ifdef CONFIG_PARAVIRT > > -unsigned long long sched_clock(void) > > -{ > > - return paravirt_sched_clock(); > > -} > > -#else > > -unsigned long long sched_clock(void) > > - __attribute__((alias("native_sched_clock"))); > > -#endif > > - > > unsigned long native_calculate_cpu_khz(void) > > { > > unsigned long long start, end; > > @@ -238,11 +170,6 @@ time_cpufreq_notifier(struct notifier_bl > > ref_freq, freq->new); > > if (!(freq->flags & CPUFREQ_CONST_LOOPS)) { > > tsc_khz = cpu_khz; > > - set_cyc2ns_scale(cpu_khz); > > - /* > > - * TSC based sched_clock turns > > - * to junk w/ cpufreq > > - */ > > mark_tsc_unstable("cpufreq changes"); > > } > > } > > @@ -380,7 +307,6 @@ void __init tsc_init(void) > > (unsigned long)cpu_khz / 1000, > > (unsigned long)cpu_khz % 1000); > > > > - set_cyc2ns_scale(cpu_khz); > > use_tsc_delay(); > > > > /* Check and install the TSC clocksource */ > > Index: linux/arch/i386/kernel/Makefile > > =================================================================== > > --- linux.orig/arch/i386/kernel/Makefile > > +++ linux/arch/i386/kernel/Makefile > > @@ -7,7 +7,8 @@ extra-y := head.o init_task.o vmlinux.ld > > obj-y := process.o signal.o entry.o traps.o irq.o \ > > ptrace.o time.o ioport.o ldt.o setup.o i8259.o sys_i386.o \ > > pci-dma.o i386_ksyms.o i387.o bootflag.o e820.o\ > > - quirks.o i8237.o topology.o alternative.o i8253.o tsc.o > > + quirks.o i8237.o topology.o alternative.o i8253.o tsc.o \ > > + sched-clock.o > > > > obj-$(CONFIG_STACKTRACE) += stacktrace.o > > obj-y += cpu/ > > Index: linux/include/asm-i386/timer.h > > =================================================================== > > --- linux.orig/include/asm-i386/timer.h > > +++ linux/include/asm-i386/timer.h > > @@ -6,7 +6,6 @@ > > #define TICK_SIZE (tick_nsec / 1000) > > > > void setup_pit_timer(void); > > -unsigned long long native_sched_clock(void); > > unsigned long native_calculate_cpu_khz(void); > > > > extern int timer_ack; > > @@ -18,35 +17,6 @@ extern int recalibrate_cpu_khz(void); > > #define calculate_cpu_khz() native_calculate_cpu_khz() > > #endif > > > > -/* Accellerators for sched_clock() > > - * convert from cycles(64bits) => nanoseconds (64bits) > > - * basic equation: > > - * ns = cycles / (freq / ns_per_sec) > > - * ns = cycles * (ns_per_sec / freq) > > - * ns = cycles * (10^9 / (cpu_khz * 10^3)) > > - * ns = cycles * (10^6 / cpu_khz) > > - * > > - * Then we use scaling math (suggested by [EMAIL PROTECTED]) to get: > > - * ns = cycles * (10^6 * SC / cpu_khz) / SC > > - * ns = cycles * cyc2ns_scale / SC > > - * > > - * And since SC is a constant power of two, we can convert the div > > - * into a shift. > > - * > > - * We can use khz divisor instead of mhz to keep a better percision, since > > - * cyc2ns_scale is limited to 10^6 * 2^10, which fits in 32 bits. > > - * ([EMAIL PROTECTED]) > > - * > > - * [EMAIL PROTECTED] "math is hard, lets go shopping!" > > - */ > > -extern unsigned long cyc2ns_scale __read_mostly; > > - > > -#define CYC2NS_SCALE_FACTOR 10 /* 2^10, carefully chosen */ > > - > > -static inline unsigned long long cycles_2_ns(unsigned long long cyc) > > -{ > > - return (cyc * cyc2ns_scale) >> CYC2NS_SCALE_FACTOR; > > -} > > - > > +u64 cycles_2_ns(u64 cyc); > > > > #endif > > Index: linux/include/asm-i386/tsc.h > > =================================================================== > > --- linux.orig/include/asm-i386/tsc.h > > +++ linux/include/asm-i386/tsc.h > > @@ -63,6 +63,7 @@ extern void tsc_init(void); > > extern void mark_tsc_unstable(char *reason); > > extern int unsynchronized_tsc(void); > > extern void init_tsc_clocksource(void); > > +extern unsigned long long tsc_sched_clock(void); > > > > /* > > * Boot-time check whether the TSCs are synchronized across > > - > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > > the body of a message to [EMAIL PROTECTED] > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > Please read the FAQ at http://www.tux.org/lkml/ > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > -- Mathieu Desnoyers Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/