Re: [v2 0/9] Early boot time stamps for x86

2017-03-25 Thread Pasha Tatashin

Hi Thomas,

Thank you very much for a very insightful feedback. I will address your 
comments, and if I have any questions, I will ask them before sending 
out the next patchset.


A few replies below:


First of all, this "solution" is only valid for a very restricted set of
systems and breaks others in very subtle ways.

FYI, the architecture name is x86, which handles 32bit/64bit, various
vendors and hypervisors. It's not 'arch/mymachine'.


The reason why I only targeted newer processors and 64-bit platforms was 
because these are the machines that keep growing in size: more memory, 
cpu count etc. Which, means these are the machine where we can get 
scalability problem in the future.




So this needs to be integrated into the existing TSC/CPU calibration
mechanism which works across all supported systems. And that means, that
early TSC calibration can't be done before x86_init.oem.arch_setup() and
hypervisor_init_platform() have been invoked.

By that time, which is still pretty early in the boot process:

  - boot_cpu_data has been preinitialized

  - calibration functions setup has been done


This sounds good. I will sanity check that we do not spend excessive 
amount of time before x86_init.oem.arch_setup() is done on some larger 
machines, and will start early clock sometime after this call is finished.



Putting the early TSC initialization after hypervisor_init_platform() makes
this available for _ALL_ systems/platforms as infrastructure and allows the
platforms to add the required bits of support via a new set of function
pointers in struct x86_platform_ops, which btw. is a proper form of
abstraction.


Ok


The time between x86_64_start_kernel() and that point is in the low single
digit millisecond range or less than a millisecond and therefor completely
irrelevant. The first timestamps before that point will be 0 as they used
to be.

You can argue in circles about that, it's simply not debatable.


Sure, I tried perfection by having timestamps available with the first 
log message, but your proposed practicality makes sense.



But that's only the sched clock part of the problem. If this early
sched_clock() is available, then this needs to be fed into the setup of
timekeeping as well, otherwise dmesg tells 50 seconds into boot and clock
monotonic/boottime say 5, which would be confusing at best. That has not be
solved in the first step, but definitely before something like this is
going to be merged.


Thank you for catching this, I will make sure boottime/dmesg times match.

Thank you,
Pasha


Re: [v2 0/9] Early boot time stamps for x86

2017-03-25 Thread Pasha Tatashin

Hi Thomas,

Thank you very much for a very insightful feedback. I will address your 
comments, and if I have any questions, I will ask them before sending 
out the next patchset.


A few replies below:


First of all, this "solution" is only valid for a very restricted set of
systems and breaks others in very subtle ways.

FYI, the architecture name is x86, which handles 32bit/64bit, various
vendors and hypervisors. It's not 'arch/mymachine'.


The reason why I only targeted newer processors and 64-bit platforms was 
because these are the machines that keep growing in size: more memory, 
cpu count etc. Which, means these are the machine where we can get 
scalability problem in the future.




So this needs to be integrated into the existing TSC/CPU calibration
mechanism which works across all supported systems. And that means, that
early TSC calibration can't be done before x86_init.oem.arch_setup() and
hypervisor_init_platform() have been invoked.

By that time, which is still pretty early in the boot process:

  - boot_cpu_data has been preinitialized

  - calibration functions setup has been done


This sounds good. I will sanity check that we do not spend excessive 
amount of time before x86_init.oem.arch_setup() is done on some larger 
machines, and will start early clock sometime after this call is finished.



Putting the early TSC initialization after hypervisor_init_platform() makes
this available for _ALL_ systems/platforms as infrastructure and allows the
platforms to add the required bits of support via a new set of function
pointers in struct x86_platform_ops, which btw. is a proper form of
abstraction.


Ok


The time between x86_64_start_kernel() and that point is in the low single
digit millisecond range or less than a millisecond and therefor completely
irrelevant. The first timestamps before that point will be 0 as they used
to be.

You can argue in circles about that, it's simply not debatable.


Sure, I tried perfection by having timestamps available with the first 
log message, but your proposed practicality makes sense.



But that's only the sched clock part of the problem. If this early
sched_clock() is available, then this needs to be fed into the setup of
timekeeping as well, otherwise dmesg tells 50 seconds into boot and clock
monotonic/boottime say 5, which would be confusing at best. That has not be
solved in the first step, but definitely before something like this is
going to be merged.


Thank you for catching this, I will make sure boottime/dmesg times match.

Thank you,
Pasha


Re: [v2 0/9] Early boot time stamps for x86

2017-03-25 Thread Thomas Gleixner
On Sat, 25 Mar 2017, Pasha Tatashin wrote:
> The second versions was actually meant as a reply to your e-mail: the code
> differences were minimal: the main differences were in the cover letter. You
> mentioned that it is not necessary to have early boot time stamps, and  I
> wanted to show examples how this data is useful to track scalability bugs and
> avoid future regressions.

I did not say that early timestamps are not useful. I said they fall not
into the category of things which are absolutely necessary in the earliest
boot process. Can you spot the difference?

If you want to show me something to prove a point, then why don't you reply
to my mail as anybody else does who wants a to start a conversation or
discussion?

Ignoring my remarks and then sending another hastily cobbled together patch
set is not in any way a form of conversation/discussion or follow up to my
mail.

> Anyway, you asked for some time to think about this problem,

I certainly did not ask for some time. ...

I told you that I need to find a free time slot to look into that in
detail. I told you as well, that some of your decisions (earliest boot
code, code duplication etc.) are not acceptable.  The fact that I told you
that I need to find a free time slot also means that I'm going to tell you
in more detail why and what needs to be done.

> ... I won't send any replies to this thread for the next two weeks.

Not replying to mail for a fixed amount of time is just silly. What are you
going to do after that? Resend another half baken patch series which
ignores every review comment?

The review/feedback mechanism does not work that way. If you get feedback,
then you either accept it and act accordingly or you discuss it as a
response to that feedback. When that discussion is settled, then it's time
to rework stuff and send out a new version after making sure that
everything is addressed.

> So, please consider this solution. The feature is well abstracted, does
> not harm the performance of the fast path, and if necessary it can also
> be made optional with something like: CONFIG_HAVE_UNSTABLE_EARLY_CLOCK

Well abstracted? I have no idea what your understanding of abstractions is,
but it's definitely different from mine. This is not abstracted, it's glued
into the code in the 'works for me' mode. It's a proof of concept if at
all. And no, this won't have a CONFIG switch to clutter the code even more.

First of all, this "solution" is only valid for a very restricted set of
systems and breaks others in very subtle ways.

FYI, the architecture name is x86, which handles 32bit/64bit, various
vendors and hypervisors. It's not 'arch/mymachine'.

So this needs to be integrated into the existing TSC/CPU calibration
mechanism which works across all supported systems. And that means, that
early TSC calibration can't be done before x86_init.oem.arch_setup() and
hypervisor_init_platform() have been invoked.

By that time, which is still pretty early in the boot process:

  - boot_cpu_data has been preinitialized

  - calibration functions setup has been done

So this won't ignore decisions made by the various platform quirks and does
not require pointless copies of existing code and absurd hacks to make it
work. Also at this point the most fragile parts of the boot process have
been handled.

Putting the early TSC initialization after hypervisor_init_platform() makes
this available for _ALL_ systems/platforms as infrastructure and allows the
platforms to add the required bits of support via a new set of function
pointers in struct x86_platform_ops, which btw. is a proper form of
abstraction. For some of them it's just switching the calibration function
to the early one, for others this won't be possible ever.

The time between x86_64_start_kernel() and that point is in the low single
digit millisecond range or less than a millisecond and therefor completely
irrelevant. The first timestamps before that point will be 0 as they used
to be.

You can argue in circles about that, it's simply not debatable.

Doing that allows to simply split the existing TSC init into an early and
late part and that late part is going to switch over from early sched clock
to the regular one w/o any _fini() calls or other absurdities.

The same applies to the sched core clock code. This can be integrated w/o
all that extra early/fini() hackery cleanly. All it takes is a weak
sched_clock_early() function which returns 0 to start with. You probably
can figure out how to overload it.

But that's only the sched clock part of the problem. If this early
sched_clock() is available, then this needs to be fed into the setup of
timekeeping as well, otherwise dmesg tells 50 seconds into boot and clock
monotonic/boottime say 5, which would be confusing at best. That has not be
solved in the first step, but definitely before something like this is
going to be merged.

Thanks,

tglx


Re: [v2 0/9] Early boot time stamps for x86

2017-03-25 Thread Thomas Gleixner
On Sat, 25 Mar 2017, Pasha Tatashin wrote:
> The second versions was actually meant as a reply to your e-mail: the code
> differences were minimal: the main differences were in the cover letter. You
> mentioned that it is not necessary to have early boot time stamps, and  I
> wanted to show examples how this data is useful to track scalability bugs and
> avoid future regressions.

I did not say that early timestamps are not useful. I said they fall not
into the category of things which are absolutely necessary in the earliest
boot process. Can you spot the difference?

If you want to show me something to prove a point, then why don't you reply
to my mail as anybody else does who wants a to start a conversation or
discussion?

Ignoring my remarks and then sending another hastily cobbled together patch
set is not in any way a form of conversation/discussion or follow up to my
mail.

> Anyway, you asked for some time to think about this problem,

I certainly did not ask for some time. ...

I told you that I need to find a free time slot to look into that in
detail. I told you as well, that some of your decisions (earliest boot
code, code duplication etc.) are not acceptable.  The fact that I told you
that I need to find a free time slot also means that I'm going to tell you
in more detail why and what needs to be done.

> ... I won't send any replies to this thread for the next two weeks.

Not replying to mail for a fixed amount of time is just silly. What are you
going to do after that? Resend another half baken patch series which
ignores every review comment?

The review/feedback mechanism does not work that way. If you get feedback,
then you either accept it and act accordingly or you discuss it as a
response to that feedback. When that discussion is settled, then it's time
to rework stuff and send out a new version after making sure that
everything is addressed.

> So, please consider this solution. The feature is well abstracted, does
> not harm the performance of the fast path, and if necessary it can also
> be made optional with something like: CONFIG_HAVE_UNSTABLE_EARLY_CLOCK

Well abstracted? I have no idea what your understanding of abstractions is,
but it's definitely different from mine. This is not abstracted, it's glued
into the code in the 'works for me' mode. It's a proof of concept if at
all. And no, this won't have a CONFIG switch to clutter the code even more.

First of all, this "solution" is only valid for a very restricted set of
systems and breaks others in very subtle ways.

FYI, the architecture name is x86, which handles 32bit/64bit, various
vendors and hypervisors. It's not 'arch/mymachine'.

So this needs to be integrated into the existing TSC/CPU calibration
mechanism which works across all supported systems. And that means, that
early TSC calibration can't be done before x86_init.oem.arch_setup() and
hypervisor_init_platform() have been invoked.

By that time, which is still pretty early in the boot process:

  - boot_cpu_data has been preinitialized

  - calibration functions setup has been done

So this won't ignore decisions made by the various platform quirks and does
not require pointless copies of existing code and absurd hacks to make it
work. Also at this point the most fragile parts of the boot process have
been handled.

Putting the early TSC initialization after hypervisor_init_platform() makes
this available for _ALL_ systems/platforms as infrastructure and allows the
platforms to add the required bits of support via a new set of function
pointers in struct x86_platform_ops, which btw. is a proper form of
abstraction. For some of them it's just switching the calibration function
to the early one, for others this won't be possible ever.

The time between x86_64_start_kernel() and that point is in the low single
digit millisecond range or less than a millisecond and therefor completely
irrelevant. The first timestamps before that point will be 0 as they used
to be.

You can argue in circles about that, it's simply not debatable.

Doing that allows to simply split the existing TSC init into an early and
late part and that late part is going to switch over from early sched clock
to the regular one w/o any _fini() calls or other absurdities.

The same applies to the sched core clock code. This can be integrated w/o
all that extra early/fini() hackery cleanly. All it takes is a weak
sched_clock_early() function which returns 0 to start with. You probably
can figure out how to overload it.

But that's only the sched clock part of the problem. If this early
sched_clock() is available, then this needs to be fed into the setup of
timekeeping as well, otherwise dmesg tells 50 seconds into boot and clock
monotonic/boottime say 5, which would be confusing at best. That has not be
solved in the first step, but definitely before something like this is
going to be merged.

Thanks,

tglx


Re: [v2 0/9] Early boot time stamps for x86

2017-03-25 Thread Pasha Tatashin

Hi Thomas,

The second versions was actually meant as a reply to your e-mail: the 
code differences were minimal: the main differences were in the cover 
letter. You mentioned that it is not necessary to have early boot time 
stamps, and  I wanted to show examples how this data is useful to track 
scalability bugs and avoid future regressions.


Anyway, you asked for some time to think about this problem, I won't 
send any replies to this thread for the next two weeks. So, please 
consider this solution. The feature is well abstracted, does not harm 
the performance of the fast path, and if necessary it can also be made 
optional with something like:

CONFIG_HAVE_UNSTABLE_EARLY_CLOCK

Thank you,
Pasha

On 03/25/2017 06:25 AM, Thomas Gleixner wrote:

On Fri, 24 Mar 2017, Pavel Tatashin wrote:


changelog
-
v1 - v2
In patch "x86/tsc: tsc early":
- added tsc_adjusted_early()
- fixed 32-bit compile error use do_div()


Did you actually read my last reply on V1 of this?

I made it entirely clear that the way this is done, i.e. hacking it into
the earliest boost stage is not going to happen.

Further I asked you to hold off until I found some time to look into this
in detail.

So what's the point of ignoring what I said and resending the whole lot
with some more hackery applied?

I don't care about you wasting your time, but I very much care about my
time.

Thanks,

tglx









Re: [v2 0/9] Early boot time stamps for x86

2017-03-25 Thread Pasha Tatashin

Hi Thomas,

The second versions was actually meant as a reply to your e-mail: the 
code differences were minimal: the main differences were in the cover 
letter. You mentioned that it is not necessary to have early boot time 
stamps, and  I wanted to show examples how this data is useful to track 
scalability bugs and avoid future regressions.


Anyway, you asked for some time to think about this problem, I won't 
send any replies to this thread for the next two weeks. So, please 
consider this solution. The feature is well abstracted, does not harm 
the performance of the fast path, and if necessary it can also be made 
optional with something like:

CONFIG_HAVE_UNSTABLE_EARLY_CLOCK

Thank you,
Pasha

On 03/25/2017 06:25 AM, Thomas Gleixner wrote:

On Fri, 24 Mar 2017, Pavel Tatashin wrote:


changelog
-
v1 - v2
In patch "x86/tsc: tsc early":
- added tsc_adjusted_early()
- fixed 32-bit compile error use do_div()


Did you actually read my last reply on V1 of this?

I made it entirely clear that the way this is done, i.e. hacking it into
the earliest boost stage is not going to happen.

Further I asked you to hold off until I found some time to look into this
in detail.

So what's the point of ignoring what I said and resending the whole lot
with some more hackery applied?

I don't care about you wasting your time, but I very much care about my
time.

Thanks,

tglx









Re: [v2 0/9] Early boot time stamps for x86

2017-03-25 Thread Thomas Gleixner
On Fri, 24 Mar 2017, Pavel Tatashin wrote:

> changelog
> -
> v1 - v2
>   In patch "x86/tsc: tsc early":
>   - added tsc_adjusted_early()
>   - fixed 32-bit compile error use do_div()

Did you actually read my last reply on V1 of this?

I made it entirely clear that the way this is done, i.e. hacking it into
the earliest boost stage is not going to happen.

Further I asked you to hold off until I found some time to look into this
in detail.

So what's the point of ignoring what I said and resending the whole lot
with some more hackery applied?

I don't care about you wasting your time, but I very much care about my
time.

Thanks,

tglx









Re: [v2 0/9] Early boot time stamps for x86

2017-03-25 Thread Thomas Gleixner
On Fri, 24 Mar 2017, Pavel Tatashin wrote:

> changelog
> -
> v1 - v2
>   In patch "x86/tsc: tsc early":
>   - added tsc_adjusted_early()
>   - fixed 32-bit compile error use do_div()

Did you actually read my last reply on V1 of this?

I made it entirely clear that the way this is done, i.e. hacking it into
the earliest boost stage is not going to happen.

Further I asked you to hold off until I found some time to look into this
in detail.

So what's the point of ignoring what I said and resending the whole lot
with some more hackery applied?

I don't care about you wasting your time, but I very much care about my
time.

Thanks,

tglx









[v2 0/9] Early boot time stamps for x86

2017-03-24 Thread Pavel Tatashin
changelog
-
v1 - v2
In patch "x86/tsc: tsc early":
- added tsc_adjusted_early()
- fixed 32-bit compile error use do_div()

Adding early boot time stamps support for x86 machines.
SPARC patch see here:
http://www.spinics.net/lists/sparclinux/msg17372.html

sample output
-
Before:
https://hastebin.com/zofiqazuze.scala

After:
https://hastebin.com/otayoliruc.scala

example why early time is crucial
-
Here is boot log on x86 machine with 8 E7-8895 CPUs, and 1T of memory:
https://hastebin.com/ebidosafic.pas

We want to improve boot time, by deferring vmemmap initialization to when
the secondary CPUs are online:
https://hastebin.com/migajadiju.pas

We see in the log that struct pages are initialized in parallel:
[   19.446609] node 0 initialised, 31980145 pages in 530ms
...

But, we have no idea if we saved overall boot time or not by enabling this
feature, because the total boot time as shown currently has not changed
much, and is actually increased from: 18.99s to 19.52s

Now, with this feature we would see right away that the time is actually
decreased:
Before: https://hastebin.com/ivobehahud.pas
After:  https://hastebin.com/uvifasohon.pas

Boot time reduced from 40.68s to 31.41s, we saved 30% by enabling deferred
page initialization feature, great.

However, now, we have early boot time stamps, and we can analyze even
further, and see that we still spend a lot of time in this interval:
[0.009071] kexec_core: crashkernel: memory value expected
[   11.255710] Zone ranges:

And, this is where this fix comes handy:
https://lkml.org/lkml/2017/3/23/864

The final boot time on this machine is this:
https://hastebin.com/anodiqaguj.pas

So, 20.7 seconds vs. 40.68s before, and we can show it with the early boot
time stamps, and avoid future regressions by having this data always
available to us.

Pavel Tatashin (9):
  sched/clock: broken stable to unstable transfer
  sched/clock: interface to allow timestamps early in boot
  x86/cpu: determining x86 vendor early
  x86/tsc: early MSR-based CPU/TSC frequency discovery
  x86/tsc: disable early messages from quick_pit_calibrate
  x86/tsc: use cpuid to determine TSC frequency
  x86/tsc: use cpuid to determine CPU frequency
  x86/tsc: tsc early
  x86/tsc: use tsc early

 arch/x86/include/asm/processor.h |   1 +
 arch/x86/include/asm/tsc.h   |   5 ++
 arch/x86/kernel/cpu/common.c |  36 
 arch/x86/kernel/head64.c |   1 +
 arch/x86/kernel/time.c   |   1 +
 arch/x86/kernel/tsc.c| 186 ++-
 arch/x86/kernel/tsc_msr.c|  38 +---
 include/linux/sched/clock.h  |   4 +
 kernel/sched/clock.c |  70 ++-
 9 files changed, 304 insertions(+), 38 deletions(-)

-- 
1.8.3.1



[v2 0/9] Early boot time stamps for x86

2017-03-24 Thread Pavel Tatashin
changelog
-
v1 - v2
In patch "x86/tsc: tsc early":
- added tsc_adjusted_early()
- fixed 32-bit compile error use do_div()

Adding early boot time stamps support for x86 machines.
SPARC patch see here:
http://www.spinics.net/lists/sparclinux/msg17372.html

sample output
-
Before:
https://hastebin.com/zofiqazuze.scala

After:
https://hastebin.com/otayoliruc.scala

example why early time is crucial
-
Here is boot log on x86 machine with 8 E7-8895 CPUs, and 1T of memory:
https://hastebin.com/ebidosafic.pas

We want to improve boot time, by deferring vmemmap initialization to when
the secondary CPUs are online:
https://hastebin.com/migajadiju.pas

We see in the log that struct pages are initialized in parallel:
[   19.446609] node 0 initialised, 31980145 pages in 530ms
...

But, we have no idea if we saved overall boot time or not by enabling this
feature, because the total boot time as shown currently has not changed
much, and is actually increased from: 18.99s to 19.52s

Now, with this feature we would see right away that the time is actually
decreased:
Before: https://hastebin.com/ivobehahud.pas
After:  https://hastebin.com/uvifasohon.pas

Boot time reduced from 40.68s to 31.41s, we saved 30% by enabling deferred
page initialization feature, great.

However, now, we have early boot time stamps, and we can analyze even
further, and see that we still spend a lot of time in this interval:
[0.009071] kexec_core: crashkernel: memory value expected
[   11.255710] Zone ranges:

And, this is where this fix comes handy:
https://lkml.org/lkml/2017/3/23/864

The final boot time on this machine is this:
https://hastebin.com/anodiqaguj.pas

So, 20.7 seconds vs. 40.68s before, and we can show it with the early boot
time stamps, and avoid future regressions by having this data always
available to us.

Pavel Tatashin (9):
  sched/clock: broken stable to unstable transfer
  sched/clock: interface to allow timestamps early in boot
  x86/cpu: determining x86 vendor early
  x86/tsc: early MSR-based CPU/TSC frequency discovery
  x86/tsc: disable early messages from quick_pit_calibrate
  x86/tsc: use cpuid to determine TSC frequency
  x86/tsc: use cpuid to determine CPU frequency
  x86/tsc: tsc early
  x86/tsc: use tsc early

 arch/x86/include/asm/processor.h |   1 +
 arch/x86/include/asm/tsc.h   |   5 ++
 arch/x86/kernel/cpu/common.c |  36 
 arch/x86/kernel/head64.c |   1 +
 arch/x86/kernel/time.c   |   1 +
 arch/x86/kernel/tsc.c| 186 ++-
 arch/x86/kernel/tsc_msr.c|  38 +---
 include/linux/sched/clock.h  |   4 +
 kernel/sched/clock.c |  70 ++-
 9 files changed, 304 insertions(+), 38 deletions(-)

-- 
1.8.3.1