On Thu, Nov 16, 2006 at 11:54:20AM +0900, Magnus Damm wrote:
> 
> >I had few thoughts
> >
> >- You mentioned that in previous mail that you changed your compile address
> >  to 2MB and still problem was not resolved. That means problem should not
> >  be related to hardcoding the address to 2MB.
> 
> Right. After bisecting and finding the problematic commit my first
> idea was to comment out the PT_LOAD program header like this:
> 
> --- 0001/kexec/arch/x86_64/crashdump-x86_64.c
> +++ work/kexec/arch/x86_64/crashdump-x86_64.c   2006-11-15
> 17:55:18.000000000 +0900
> @@ -664,6 +664,7 @@ static int prepare_crash_memory_elf64_he
> #endif
>        }
> 
> +#if 0
>        /* Setup an PT_LOAD type program header for the region where
>         * Kernel is mapped.
>         */
> @@ -684,6 +685,8 @@ static int prepare_crash_memory_elf64_he
>                        phdr->p_vaddr, phdr->p_filesz, phdr->p_memsz);
> #endif
> 
> +#endif
> +
>        /* Setup PT_LOAD type program header for every system RAM chunk.
>         * A seprate program header for Backup Region*/
>        for (i = 0; i < CRASH_MAX_MEMORY_RANGES; i++) {
> 
> This happened to solve my problem. I retested using 1 MB physical
> start address and that seemed to work well too.
> 
> I suspect that the hardcoded PT_LOAD program header conflicted with
> the kernel command line argument [EMAIL PROTECTED] that I happen to
> use. That's my theory.
> 

I don't understand how can wrong vaddr fail transfer of vmcore. Even if I
hardcode, vaddr 2MB, physical address does not change (which I obtained from
/proc/iomem). Can you please try the attached patch. Hopefully it should solve
the issue for you.

> >  Anyway if kernel is compiled for 2MB and user space is hardcoding it to 
> >  2MB
> >  then there is no incompatibility. Secondly, even if incompatibility was
> >  there I would have expected a trouble while opening the dump using gdb
> >  instead of while copying the vmcore.
> 
> But are you hardcoding it correctly? I must confess that I don't
> understand why certain values are hardcoded, especially
> KERNEL_TEXT_SIZE. Why is it set to 40M?
> 

KERNEL_TEXT_SIZE is a kernel defined virtual address region where kernel
can reside. So x86_64 has fixed a virtual address range of 40MB for the
kernel. (Documentation/x86_64/mm.txt).

Now, if you compile your kernel for 1MB physical location, it gets shifted
in vaddr range by 1MB. But it can not cross kernel defined vaddr region
and has to move somewhere between this 40MB range. That's the reason that
on x86_64 you can't comile a kernel for 64MB physical address and boot.

Before my changes, we used to map first 40MB in kernel text region and
rest at linearly mapped region. But that's not a very good way of doing
things. I think we should always create a separate header for kernel.
(Same physical memory has been mapped at two virtual address regions hence
we need to create two headers to reflect that).

If we can determine it using /proc/kcore then it is fine otherwise we 
should simply hardcode first 40MB mappeed in kernel text region.
 
          phdr->p_vaddr = mstart + __START_KERNEL_map;

This will always make sure that gdb is not broken.

> >- I think this patch will break compatibility with gdb as we are not 
> >mapping
> >  kernel in virtual address space. All the physical chunks are being mapped
> >  to linearly mapped region in virtual address space and none is being 
> >  mapped
> >  to kernel text virtual address region.
> 
> Yes, gdb will not work with the vmcore from kernels that doesn't have
> parsable /proc/kcore. But isn't that better than hardcoding and not
> being sure that we will be able to transfer the /proc/vmcore file?
> 

I don't think that harcoding should affect transfer of vmcore. Can you
please dig little deeper to find out real cause of the problem. May
be readelf output of /proc/vmcore will also help.

Secondly, my attached patch should solve the problem now.

> My patch makes older kernels behave as they do in the latest
> kexec-tools git tree - no PT_LOAD program header for the kernel. And
> this is exactly how i386 works at the moment if I'm not mistaken.
> 
> >  I think there are two options to maintain backward compatibility with
> >  older kernels.
> >
> >a) we map 1MB to 40MB physical region in kernel text virtual address space
> >   instead of kernel linearly mapped region.
> >
> >b) we relax the elf note check in kexec-tools so that older kernels are
> >   not incompatible at all and /proc/kcore is readable. What I mean is that
> >   kexec-tools check for presense of a NULL being present in elf note name
> >   string. Instead of returning error there, we can print a warning that
> >   a null character is not present and continue processing kcore. This way
> >   older kernels will not require any special processing and ELF header
> >   creation logic will also be very neat.
> >
> >IMHO, we should go by  the second option. What do you think?
> 
> Yes, the second option sounds fine. But I see that as an enhancement
> for older kernels that can go on top of my patch.
> 

I don't think that your patch is required at all. My patch should resolve
the issue now. Your patch breaks compatibility with gdb which we don't want.

Again, I think that goals of bare metal linux and xen are very different
when it comes to kdump. So please create separate files or funcitons like
xen_prepare_64bit_elf_headers() to enforce the policies which are more
suitable for virtualized env.

> And until someone cooks up such an enhancement I think the best is to
> not provide a kernel PT_LOAD program header on old kernels. That
> should keep kexec-tools-testing on x86_64 on the same level as
> kexec-tools for old kernels which I think is the best way to deal with
> backwards compatibility.
> 

I have cooked up the patch now. Please try it out. Hopefully it solves
the issue for you.

Thanks
Vivek



o In kernels 2.6.18 and older, /proc/kcore elf note name length does not
  take into the picture the null character size. Kexec-tools very religiously
  checks for it and fails parsing /proc/kcore in older kernels. In newer
  kernels (2.6.19-rc) this problem has been fixed.

o This is not a very serious condition and instead of failing we can simply
  warn user about this condition and continue processing. This ensures
  backward compatibility with older kernels. 

o Also drop the code which was hardcoding the kernel start addr to 2MB. It
  broke with older kernels where kernel has been compiled for a different
  physical addr using CONFIG_PHYSICAL_START.

Signed-off-by: Vivek Goyal <[EMAIL PROTECTED]>
---

 kexec/arch/x86_64/crashdump-x86_64.c |   23 ++---------------------
 kexec/kexec-elf.c                    |   10 +++++++---
 2 files changed, 9 insertions(+), 24 deletions(-)

diff -puN 
kexec/kexec-elf.c~kexec-tools-x86_64-warn-on-elf-note-not-being-null-terminated 
kexec/kexec-elf.c
--- 
horms-kexec-tools/kexec/kexec-elf.c~kexec-tools-x86_64-warn-on-elf-note-not-being-null-terminated
   2006-11-15 23:50:45.000000000 -0500
+++ horms-kexec-tools-root/kexec/kexec-elf.c    2006-11-16 01:51:37.000000000 
-0500
@@ -715,8 +715,13 @@ static int build_mem_notes(const char *b
                note_size += (hdr.n_descsz + 3) & ~3;
 
                if ((hdr.n_namesz != 0) && (name[hdr.n_namesz -1] != '\0')) {
-                       fprintf(stderr, "Note name is not null termiated\n");
-                       return -1;
+                       /* If note name string is not null terminated, just
+                        * warn user about it and continue processing. This
+                        * allows us to parse /proc/kcore on older kernels
+                        * where /proc/kcore elf notes were not null
+                        * terminated. It has been fixed in 2.6.19.
+                        */
+                       fprintf(stderr, "Warning: Elf Note name is not null 
terminated\n");
                }
                ehdr->e_note[i].n_type = hdr.n_type;
                ehdr->e_note[i].n_name = (char *)name;
@@ -727,7 +732,6 @@ static int build_mem_notes(const char *b
        return 0;
 }
 
-
 void free_elf_info(struct mem_ehdr *ehdr)
 {
        free(ehdr->e_phdr);
diff -puN 
kexec/arch/x86_64/crashdump-x86_64.c~kexec-tools-x86_64-warn-on-elf-note-not-being-null-terminated
 kexec/arch/x86_64/crashdump-x86_64.c
--- 
horms-kexec-tools/kexec/arch/x86_64/crashdump-x86_64.c~kexec-tools-x86_64-warn-on-elf-note-not-being-null-terminated
        2006-11-16 00:02:39.000000000 -0500
+++ horms-kexec-tools-root/kexec/arch/x86_64/crashdump-x86_64.c 2006-11-16 
01:30:55.000000000 -0500
@@ -40,7 +40,7 @@
 /* Forward Declaration. */
 static int exclude_crash_reserve_region(int *nr_ranges);
 
-#define KERN_VADDR_ALIGN       0x200000        /* 2MB */
+#define KERN_VADDR_ALIGN       0x100000        /* 1MB */
 
 /* Read kernel physical load addr from /proc/iomem (Kernel Code) and
  * store in kexec_info */
@@ -84,24 +84,6 @@ static int get_kernel_paddr(struct kexec
        return -1;
 }
 
-/* Hardcoding kernel virtual address and size. While writting
- * this patch vanilla kernel is compiled for addr 2MB. Anybody
- * using kernel older than that which was compiled for 1MB
- * physical addr, use older version of kexec-tools. This function
- * is there just for backward compatibility reasons and we should
- * get rid of it at some point of time.
- */
-
-static int hardcode_kernel_vaddr_size(struct kexec_info *info)
-{
-       fprintf(stderr, "Warning: Hardcoding kernel virtual addr and size\n");
-       info->kern_vaddr_start = __START_KERNEL_map + 0x200000;
-       info->kern_size = KERNEL_TEXT_SIZE - 0x200000;
-       fprintf(stderr, "Warning: virtual addr = 0x%lx size = 0x%lx\n",
-               info->kern_vaddr_start, info->kern_size);
-       return 0;
-}
-
 /* Retrieve info regarding virtual address kernel has been compiled for and
  * size of the kernel from /proc/kcore. Current /proc/kcore parsing from
  * from kexec-tools fails because of malformed elf notes. A kernel patch has
@@ -137,8 +119,7 @@ static int get_kernel_vaddr_and_size(str
        result = build_elf_core_info(buf, size, &ehdr, elf_flags);
        if (result < 0) {
                fprintf(stderr, "ELF core (kcore) parse failed\n");
-               hardcode_kernel_vaddr_size(info);
-               return 0;
+               return -1;
        }
 
        /* Traverse through the Elf headers and find the region where
_
_______________________________________________
fastboot mailing list
[email protected]
https://lists.osdl.org/mailman/listinfo/fastboot

Reply via email to