I mistakenly only included David in the original patch email so I have
added the other people from get_maintainer.pl. I believe this patch
could be fixing a rather significant problem (a potential kernel panic)
so some feedback would be appreciated.

Thanks,
Matt 

On Thu, 2015-04-30 at 11:48 +1200, Matt Bennett wrote:
> During development work on a 3.16 based kernel it was found that a
> number of builds would panic during the kernel init process, more
> specifically in 'delayed_fput()'. The panic showed the kernel trying
> to access a memory address of '0xb7fdc00' while traversing the
> 'delayed_fput_list' structure. Comparing this memory address to the
> value of the pointer used on builds that did not panic confirmed
> that the pointer on crashing builds must have been corrupted at some
> stage earlier in the init process.
> 
> By traversing the list earlier and earlier in the code it was found
> that 'plat_mem_setup()' was responsible for corrupting the list.
> Specifically the line:
> 
>     memory = cvmx_bootmem_phy_alloc(mem_alloc_size,
>                       __pa_symbol(&__init_end), -1,
>                       0x100000,
>                       CVMX_BOOTMEM_FLAG_NO_LOCKING);
> 
> Which would eventually call:
> 
>     cvmx_bootmem_phy_set_size(new_ent_addr,
>               cvmx_bootmem_phy_get_size
>               (ent_addr) -
>               (desired_min_addr -
>                       ent_addr));
> 
> Where 'new_ent_addr'=0x4800000 (the address of 'delayed_fput_list')
> and the second argument (size)=0xb7fdc00 (the address causing the
> kernel panic). The job of this part of 'plat_mem_setup()' is to
> allocate chunks of memory for the kernel to use. At the start of
> each chunk of memory the size of the chunk is written, hence the
> value 0xb7fdc00 is written onto memory at 0x4800000, therefore the
> kernel panics when it goes back to access 'delayed_fput_list' later
> on in the initialisation process.
> 
> On builds that were not crashing it was found that the compiler had
> placed 'delayed_fput_list' at 0x4800008, meaning it wasn't corrupted
> (but something else in memory was overwritten).
> 
> As can be seen in the first function call above the code begins to
> allocate chunks of memory beginning from the symbol '__init_end'.
> The MIPS linker script (vmlinux.lds.S) however defines the .bss
> section to begin after '__init_end'. Therefore memory within the
> .bss section is allocated to the kernel to use (System.map shows
> 'delayed_fput_list' and other kernel structures to be in .bss).
> 
> To stop the kernel panic (and the .bss section being corrupted)
> memory should begin being allocated from the symbol '_end'.
> 
> Signed-off-by: Matt Bennett <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> ---
>  arch/mips/cavium-octeon/setup.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/mips/cavium-octeon/setup.c b/arch/mips/cavium-octeon/setup.c
> index 7e4367b..f632f14 100644
> --- a/arch/mips/cavium-octeon/setup.c
> +++ b/arch/mips/cavium-octeon/setup.c
> @@ -1008,7 +1008,7 @@ void __init plat_mem_setup(void)
>       while ((boot_mem_map.nr_map < BOOT_MEM_MAP_MAX)
>               && (total < MAX_MEMORY)) {
>               memory = cvmx_bootmem_phy_alloc(mem_alloc_size,
> -                                             __pa_symbol(&__init_end), -1,
> +                                             __pa_symbol(&_end), -1,
>                                               0x100000,
>                                               CVMX_BOOTMEM_FLAG_NO_LOCKING);
>               if (memory >= 0) {

Reply via email to