On 10/31/07, Morris, Kevin J. (LNG-DAY) <[EMAIL PROTECTED]> wrote:

> Cannot afford to run something on Linux?
>  - How does ESALPS gather the Linux process-level data outside of
> running an agent/application ON Linux?

The big difference is whether you use an active agent or active agent.
ESALPS uses snmd which only uses resources when you retrieve data. And
because ESALPS also has the VM usage data per virtual machine, there
is no need to ask for per-process data when VM already knows the
virtual machine was idle. The other difference is where you archive
your data. When each Linux server is keeping its own archive,
maintaining the archive (and retrieving data from it) takes resources
on Linux and may also impact whatever you were trying to measure.

> The data that Linux logged would be wrong?
>  - I assume you are referring to the inaccuracies of the "tick based CPU
> time accounting"?
>    - I had the understanding that RHEL5 and SLES10 resolved this issue
> with the new "virtual CPU time accounting"?

The "tick based" accounting is only part of it. Usage is reported in
units of 10 ms which is a lot on modern CPUs. This means that many
tools will miss fractions of 10 ms on each observation, and get a very
low capture ratio. The Linux kernel does keep more accurate usage, and
ESALPS uses the process hierarchy to catch the missing fractures, and
achieves a capture ratio in the high 90's normally.

The other part is the virtualization effect, where the Linux kernel
believes it uses 100% of the CPU, but because of shared resources on
z/VM it only gets 10% (so all readings are order of magnitude off).
The latest Linux kernels have changed this to report 10% - so what's
the conclusion when you see Linux report 10% usage; is it short on CPU
or not ;-).
ESALPS has both numbers and reports both, which means you have the
true numbers for accounting and the Linux numbers to understand the
performance implications inside Linux.

>  - The memory, disk, #page faults, etc. are still accurate. Correct?

Memory as reported by "free" for example is correct for what Linux
believes. But which part of that is resident in z/VM is something
ESALPS gets from z/VM monitor data. And you need to include swap usage
in that as well.

Linux internally does report disk rates, but in a shared environment
you can not translate rates into I/O response times (which is what
impacts your application). The response time measurements inside the
Linux device driver are subject to virtual machine dispatch and likely
to make you bark up the wrong tree. ESALPS gets I/O metrics from the
subchannel measurement data and combines that with the VM data to
allocate the usage to virtual machines.

Rob
--
Rob van der Heij
Velocity Software, Inc
http://velocitysoftware.com/

----------------------------------------------------------------------
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390

Reply via email to