Bug#603229: Further information

Frede Feuerstein Sat, 27 Nov 2010 21:03:19 -0800

Hi !

> The error message about 'domain->cpu_power' does not refer to power
> management, but to the scheduler's estimation of the processing power of
> each group of processor threads.
> 
> The scheduler is trying to group the processor threads by:
> 
> - NUMA node (NODE; sharing a connection to RAM)
> - Package (CPU; sharing some caches)
> - Core (MC; sharing execution units)


So lets start here: On this machine NUMA node and Package are identical:
CPU0 / CPU1 are one group and CPU2 / CPU3 is the other.
As for all Socket 940 Opterons, the cores logically are complete CPUs
i.e. do not share execution units.

> so that it can make good decisions about where a task should run when it
> is ready to do so.
> 
> > But whereas 2.6.32-5 afterwards crashes with a divide error,
> > 2.6.30-2 starts up normally:
> [...]
> > I suppose that it is the divide error in [0.852154], we have to deal
> > with.
> [...]
> 
> The division by zero appears to be a result of getting bad information
> from the firmware about the groups of processors.

Well, technically a division error always is a result of bad data fed to
that division. I rather meant, that this is the point to backtrace the
error.
Though the bios of the w2100z is known for some problems, the cpus are
reported correctly by the bios and it is the latest version (R01-B5-S1).

>   I realise that this
> same bad information did not previously result in a crash, but I (and
> the upstream developers) need to know what that information is before we
> can understand how this can be avoided.

Are there any means to gather more information ? Tell me and i shall do
it. 

Tilo






-- 
To UNSUBSCRIBE, email to [email protected]
with a subject of "unsubscribe". Trouble? Contact [email protected]
Archive: http://lists.debian.org/1290920436.4255.1025.ca...@localhost

Bug#603229: Further information

Reply via email to