Re: [PATCH v3] x86/power/64: Fix kernel text mapping corruption during image restoration
On Thu, Jun 30, 2016 at 5:56 AM, Logan Gunthorpewrote: > > > On 29/06/16 08:55 PM, Rafael J. Wysocki wrote: > >> The only thing that comes to mind at this point is that TLBs should be >> flushed >> after page tables changes, so please apply the appended and let me know >> if you see this panic any more with it. > > > > Ok, I'll build a new kernel tomorrow. Well, please wait for a while. I'm looking at the panic log right now and the picture is a bit clearer now. > But keep in mind the panic is pretty > rare as I've only seen it once so far after a couple dozen or so hibernates. > So it may be hard to get a concrete yes or no on whether it fixes the issue. Right. > I've got a script to run a bunch of hibernates in a row. I usually only run > it for a handful of iterations, but I'll try running it for much longer with > this patch and let you know in a couple days. As I said, please wait a bit, there may be updates in the meantime. :-) >From looking at your panic log, the exception happened in swsusp_arch_resume(), which probably covers restore_image() too, because that is likely to go into swsusp_arch_resume() in line. Next, the address in RIP (a) clearly is a page start and (b) is relative to the identity mapping, so it most likely is the address from relocated_restore_code. Moreover, the RCX value is the same address and the values in the other registers also match exactly the situation before the jump to relocated_restore_code. Thus the exception was triggered by that jump beyond doubt. Now, if you look above the Oops line, it becomes quite clear what happened. Namely, dump_pagetable() (arch/x86/mm/fault.c, line 524 in 4.6) prints the PGD, the PUD, the PMD and the PTE in that order, unless the lower levels (PTE, PMD) are not present. In your panic log, only the PGD and PUD are present, meaning that this is a 1G page and sure enough it has the NX bit set. This case was clearly overlooked by relocate_restore_code(), so updated patch will follow. Thanks, Rafael
Re: [PATCH v3] x86/power/64: Fix kernel text mapping corruption during image restoration
On Thu, Jun 30, 2016 at 5:56 AM, Logan Gunthorpe wrote: > > > On 29/06/16 08:55 PM, Rafael J. Wysocki wrote: > >> The only thing that comes to mind at this point is that TLBs should be >> flushed >> after page tables changes, so please apply the appended and let me know >> if you see this panic any more with it. > > > > Ok, I'll build a new kernel tomorrow. Well, please wait for a while. I'm looking at the panic log right now and the picture is a bit clearer now. > But keep in mind the panic is pretty > rare as I've only seen it once so far after a couple dozen or so hibernates. > So it may be hard to get a concrete yes or no on whether it fixes the issue. Right. > I've got a script to run a bunch of hibernates in a row. I usually only run > it for a handful of iterations, but I'll try running it for much longer with > this patch and let you know in a couple days. As I said, please wait a bit, there may be updates in the meantime. :-) >From looking at your panic log, the exception happened in swsusp_arch_resume(), which probably covers restore_image() too, because that is likely to go into swsusp_arch_resume() in line. Next, the address in RIP (a) clearly is a page start and (b) is relative to the identity mapping, so it most likely is the address from relocated_restore_code. Moreover, the RCX value is the same address and the values in the other registers also match exactly the situation before the jump to relocated_restore_code. Thus the exception was triggered by that jump beyond doubt. Now, if you look above the Oops line, it becomes quite clear what happened. Namely, dump_pagetable() (arch/x86/mm/fault.c, line 524 in 4.6) prints the PGD, the PUD, the PMD and the PTE in that order, unless the lower levels (PTE, PMD) are not present. In your panic log, only the PGD and PUD are present, meaning that this is a 1G page and sure enough it has the NX bit set. This case was clearly overlooked by relocate_restore_code(), so updated patch will follow. Thanks, Rafael
Re: [PATCH v3] x86/power/64: Fix kernel text mapping corruption during image restoration
On Thu, Jun 30, 2016 at 11:45 AM, Borislav Petkovwrote: > On Thu, Jun 30, 2016 at 04:20:43AM +0200, Rafael J. Wysocki wrote: >> That's not what Boris was seeing at least. > > Well, I had it a couple of times during testing patches. This is all > from the logs: > > [ 65.121109] PM: Basic memory bitmaps freed > [ 65.125991] Restarting tasks ... > [ 65.129342] kernel tried to execute NX-protected page - exploit attempt? > (uid: 0) > [ 65.129585] done. > [ 65.141314] BUG: unable to handle kernel paging request at 88042b957e40 I mean the failure mode, not the particular exception type. :-) You always saw it in a user space task after kernel resume: > [ 65.141340] Call Trace: > [ 65.141344] [] ? getname_flags+0x5e/0x1b0 > [ 65.141346] [] ? cp_new_stat+0x10f/0x120 > [ 65.141348] [] ? ktime_get_ts64+0x4a/0xf0 > [ 65.141353] [] ? poll_select_copy_remaining+0xe7/0x130 > [ 65.141355] [] exit_to_usermode_loop+0x8a/0xb0 > [ 65.141356] [] syscall_return_slowpath+0x5b/0x70 > [ 65.141358] [] entry_SYSCALL_64_fastpath+0xa5/0xa7 [cut] > [ 381.850792] Call Trace: > [ 381.850795] [] ? getname_flags+0x5e/0x1b0 > [ 381.850797] [] ? cp_new_stat+0x10f/0x120 > [ 381.850799] [] ? ktime_get_ts64+0x4a/0xf0 > [ 381.850800] [] ? poll_select_copy_remaining+0xe7/0x130 > [ 381.850802] [] exit_to_usermode_loop+0x8a/0xb0 > [ 381.850804] [] syscall_return_slowpath+0x5b/0x70 > [ 381.850806] [] entry_SYSCALL_64_fastpath+0xa5/0xa7 [cut] > [ 49.022675] Call Trace: > [ 49.022680] [] ? getname_flags+0x5e/0x1b0 > [ 49.022683] [] ? cp_new_stat+0x10f/0x120 > [ 49.022686] [] ? ktime_get_ts64+0x4a/0xf0 > [ 49.022689] [] ? poll_select_copy_remaining+0xe7/0x130 > [ 49.022692] [] exit_to_usermode_loop+0x8a/0xb0 > [ 49.022695] [] syscall_return_slowpath+0x5b/0x70 > [ 49.022698] [] entry_SYSCALL_64_fastpath+0xa5/0xa7 [cut] > [ 39.636905] Call Trace: > [ 39.636908] [] ? getname_flags+0x5e/0x1b0 > [ 39.636910] [] ? cp_new_stat+0x10f/0x120 > [ 39.636912] [] ? ktime_get_ts64+0x4a/0xf0 > [ 39.636917] [] ? poll_select_copy_remaining+0xe7/0x130 > [ 39.636919] [] exit_to_usermode_loop+0x8a/0xb0 > [ 39.636921] [] syscall_return_slowpath+0x5b/0x70 > [ 39.636922] [] entry_SYSCALL_64_fastpath+0xa5/0xa7 which is a clear indication of image corruption during restore. In the Logan's case this happens in swsusp_arch_resume() proper and the address in RIP is relative to the identity mapping, so the only place it can happen is the jump to relocated_restore_code. That's because before that jump the addresses in RIP are relative to the kernel text mapping and after it we immediately switch over to the temporary page tables which are all executable. So that is the only place AFAICS. Also in your case the failure was 100% reproducible, while in the Logan's case it has happened once so far (so generally it happens once in a blue moon). In summary, I'm sure that this is a different issue. Thanks, Rafael
Re: [PATCH v3] x86/power/64: Fix kernel text mapping corruption during image restoration
On Thu, Jun 30, 2016 at 11:45 AM, Borislav Petkov wrote: > On Thu, Jun 30, 2016 at 04:20:43AM +0200, Rafael J. Wysocki wrote: >> That's not what Boris was seeing at least. > > Well, I had it a couple of times during testing patches. This is all > from the logs: > > [ 65.121109] PM: Basic memory bitmaps freed > [ 65.125991] Restarting tasks ... > [ 65.129342] kernel tried to execute NX-protected page - exploit attempt? > (uid: 0) > [ 65.129585] done. > [ 65.141314] BUG: unable to handle kernel paging request at 88042b957e40 I mean the failure mode, not the particular exception type. :-) You always saw it in a user space task after kernel resume: > [ 65.141340] Call Trace: > [ 65.141344] [] ? getname_flags+0x5e/0x1b0 > [ 65.141346] [] ? cp_new_stat+0x10f/0x120 > [ 65.141348] [] ? ktime_get_ts64+0x4a/0xf0 > [ 65.141353] [] ? poll_select_copy_remaining+0xe7/0x130 > [ 65.141355] [] exit_to_usermode_loop+0x8a/0xb0 > [ 65.141356] [] syscall_return_slowpath+0x5b/0x70 > [ 65.141358] [] entry_SYSCALL_64_fastpath+0xa5/0xa7 [cut] > [ 381.850792] Call Trace: > [ 381.850795] [] ? getname_flags+0x5e/0x1b0 > [ 381.850797] [] ? cp_new_stat+0x10f/0x120 > [ 381.850799] [] ? ktime_get_ts64+0x4a/0xf0 > [ 381.850800] [] ? poll_select_copy_remaining+0xe7/0x130 > [ 381.850802] [] exit_to_usermode_loop+0x8a/0xb0 > [ 381.850804] [] syscall_return_slowpath+0x5b/0x70 > [ 381.850806] [] entry_SYSCALL_64_fastpath+0xa5/0xa7 [cut] > [ 49.022675] Call Trace: > [ 49.022680] [] ? getname_flags+0x5e/0x1b0 > [ 49.022683] [] ? cp_new_stat+0x10f/0x120 > [ 49.022686] [] ? ktime_get_ts64+0x4a/0xf0 > [ 49.022689] [] ? poll_select_copy_remaining+0xe7/0x130 > [ 49.022692] [] exit_to_usermode_loop+0x8a/0xb0 > [ 49.022695] [] syscall_return_slowpath+0x5b/0x70 > [ 49.022698] [] entry_SYSCALL_64_fastpath+0xa5/0xa7 [cut] > [ 39.636905] Call Trace: > [ 39.636908] [] ? getname_flags+0x5e/0x1b0 > [ 39.636910] [] ? cp_new_stat+0x10f/0x120 > [ 39.636912] [] ? ktime_get_ts64+0x4a/0xf0 > [ 39.636917] [] ? poll_select_copy_remaining+0xe7/0x130 > [ 39.636919] [] exit_to_usermode_loop+0x8a/0xb0 > [ 39.636921] [] syscall_return_slowpath+0x5b/0x70 > [ 39.636922] [] entry_SYSCALL_64_fastpath+0xa5/0xa7 which is a clear indication of image corruption during restore. In the Logan's case this happens in swsusp_arch_resume() proper and the address in RIP is relative to the identity mapping, so the only place it can happen is the jump to relocated_restore_code. That's because before that jump the addresses in RIP are relative to the kernel text mapping and after it we immediately switch over to the temporary page tables which are all executable. So that is the only place AFAICS. Also in your case the failure was 100% reproducible, while in the Logan's case it has happened once so far (so generally it happens once in a blue moon). In summary, I'm sure that this is a different issue. Thanks, Rafael
Re: [PATCH v3] x86/power/64: Fix kernel text mapping corruption during image restoration
On Thu, Jun 30, 2016 at 04:20:43AM +0200, Rafael J. Wysocki wrote: > That's not what Boris was seeing at least. Well, I had it a couple of times during testing patches. This is all from the logs: [ 65.121109] PM: Basic memory bitmaps freed [ 65.125991] Restarting tasks ... [ 65.129342] kernel tried to execute NX-protected page - exploit attempt? (uid: 0) [ 65.129585] done. [ 65.141314] BUG: unable to handle kernel paging request at 88042b957e40 [ 65.141316] IP: [] 0x88042b957e40 [ 65.141318] PGD 2067067 PUD 206a067 PMD 80042b8001e3 [ 65.141319] Oops: 0011 [#1] PREEMPT SMP [ 65.141327] Modules linked in: binfmt_misc ipv6 vfat fat amd64_edac_mod edac_mce_amd fuse dm_crypt dm_mod amdkfd kvm_amd kvm amd_iommu_v2 irqbypass crc32_pclmul radeon aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd fam15h_power k10temp acpi_cpufreq [ 65.141328] CPU: 6 PID: 1 Comm: init Not tainted 4.7.0-rc3+ #4 [ 65.141329] Hardware name: To be filled by O.E.M. To be filled by O.E.M./M5A97 EVO R2.0, BIOS 1503 01/16/2013 [ 65.141329] task: 88042b958000 ti: 88042b954000 task.ti: 88042b954000 [ 65.141331] RIP: 0010:[] [] 0x88042b957e40 [ 65.141331] RSP: 0018:88042b957e00 EFLAGS: 00010282 [ 65.141332] RAX: RBX: 88042b957f58 RCX: [ 65.141333] RDX: 0001 RSI: 81063b59 RDI: 8168898c [ 65.141333] RBP: 88042b957ef0 R08: R09: 0002 [ 65.141334] R10: R11: 0001 R12: 88042b954000 [ 65.141334] R13: 88042b954000 R14: 88042b957f58 R15: 88042b958000 [ 65.141335] FS: 7fad32173800() GS:88043dd8() knlGS: [ 65.141336] CS: 0010 DS: ES: CR0: 80050033 [ 65.141336] CR2: 88042b957e40 CR3: 0004298e6000 CR4: 000406e0 [ 65.141336] Stack: [ 65.141338] 880037b81000 880037b81000 81181e1e [ 65.141339] ff9c0002 880429e8c600 811782bf 0011 [ 65.141340] 049c 0001 1180 [ 65.141340] Call Trace: [ 65.141344] [] ? getname_flags+0x5e/0x1b0 [ 65.141346] [] ? cp_new_stat+0x10f/0x120 [ 65.141348] [] ? ktime_get_ts64+0x4a/0xf0 [ 65.141353] [] ? poll_select_copy_remaining+0xe7/0x130 [ 65.141355] [] exit_to_usermode_loop+0x8a/0xb0 [ 65.141356] [] syscall_return_slowpath+0x5b/0x70 [ 65.141358] [] entry_SYSCALL_64_fastpath+0xa5/0xa7 [ 65.141374] Code: 00 00 00 1e 1e 18 81 ff ff ff ff 02 00 00 00 9c ff ff ff 00 c6 e8 29 04 88 ff ff bf 82 17 81 ff ff ff ff 11 00 00 00 00 00 00 00 <9c> 04 00 00 00 00 00 00 01 00 00 00 00 00 00 00 80 11 00 00 00 [ 65.141375] RIP [] 0x88042b957e40 [ 65.141376] RSP [ 65.141376] CR2: 88042b957e40 [ 65.141378] ---[ end trace 5dc71ecf8d888ee6 ]--- [ 65.141509] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0009 [ 65.141509] [ 65.149191] Kernel Offset: disabled [ 65.449314] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0009 ... [ 381.835297] Restarting tasks ... [ 381.838620] kernel tried to execute NX-protected page - exploit attempt? (uid: 0) [ 381.838689] done. [ 381.850763] BUG: unable to handle kernel paging request at 88042b957e40 [ 381.850765] IP: [] 0x88042b957e40 [ 381.850766] PGD 2065067 PUD 2068067 PMD 80042b8001e3 [ 381.850767] Oops: 0011 [#1] PREEMPT SMP [ 381.850778] Modules linked in: binfmt_misc ipv6 vfat fat amd64_edac_mod edac_mce_amd fuse dm_crypt dm_mod amdkfd kvm_amd kvm amd_iommu_v2 radeon irqbypass crc32_pclmul aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd k10temp fam15h_power acpi_cpufreq [ 381.850779] CPU: 3 PID: 1 Comm: init Not tainted 4.7.0-rc3+ #1 [ 381.850780] Hardware name: To be filled by O.E.M. To be filled by O.E.M./M5A97 EVO R2.0, BIOS 1503 01/16/2013 [ 381.850781] task: 88042b958000 ti: 88042b954000 task.ti: 88042b954000 [ 381.850782] RIP: 0010:[] [] 0x88042b957e40 [ 381.850783] RSP: 0018:88042b957e00 EFLAGS: 00010282 [ 381.850783] RAX: RBX: 88042b957f58 RCX: [ 381.850784] RDX: 0001 RSI: 81062a2d RDI: 81687d8c [ 381.850784] RBP: 88042b957ef0 R08: R09: 0002 [ 381.850785] R10: R11: 0001 R12: 88042b954000 [ 381.850785] R13: 88042b954000 R14: 88042b957f58 R15: 88042b958000 [ 381.850786] FS: 7f1143649800() GS:88043dcc() knlGS: [ 381.850787] CS: 0010 DS: ES: CR0: 80050033 [ 381.850787] CR2: 88042b957e40 CR3: 0004298af000 CR4: 000406e0 [ 381.850788] Stack: [ 381.850789] 88042b1ed000 88042b1ed000 8117f8ae [ 381.850790] ff9c0002
Re: [PATCH v3] x86/power/64: Fix kernel text mapping corruption during image restoration
On Thu, Jun 30, 2016 at 04:20:43AM +0200, Rafael J. Wysocki wrote: > That's not what Boris was seeing at least. Well, I had it a couple of times during testing patches. This is all from the logs: [ 65.121109] PM: Basic memory bitmaps freed [ 65.125991] Restarting tasks ... [ 65.129342] kernel tried to execute NX-protected page - exploit attempt? (uid: 0) [ 65.129585] done. [ 65.141314] BUG: unable to handle kernel paging request at 88042b957e40 [ 65.141316] IP: [] 0x88042b957e40 [ 65.141318] PGD 2067067 PUD 206a067 PMD 80042b8001e3 [ 65.141319] Oops: 0011 [#1] PREEMPT SMP [ 65.141327] Modules linked in: binfmt_misc ipv6 vfat fat amd64_edac_mod edac_mce_amd fuse dm_crypt dm_mod amdkfd kvm_amd kvm amd_iommu_v2 irqbypass crc32_pclmul radeon aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd fam15h_power k10temp acpi_cpufreq [ 65.141328] CPU: 6 PID: 1 Comm: init Not tainted 4.7.0-rc3+ #4 [ 65.141329] Hardware name: To be filled by O.E.M. To be filled by O.E.M./M5A97 EVO R2.0, BIOS 1503 01/16/2013 [ 65.141329] task: 88042b958000 ti: 88042b954000 task.ti: 88042b954000 [ 65.141331] RIP: 0010:[] [] 0x88042b957e40 [ 65.141331] RSP: 0018:88042b957e00 EFLAGS: 00010282 [ 65.141332] RAX: RBX: 88042b957f58 RCX: [ 65.141333] RDX: 0001 RSI: 81063b59 RDI: 8168898c [ 65.141333] RBP: 88042b957ef0 R08: R09: 0002 [ 65.141334] R10: R11: 0001 R12: 88042b954000 [ 65.141334] R13: 88042b954000 R14: 88042b957f58 R15: 88042b958000 [ 65.141335] FS: 7fad32173800() GS:88043dd8() knlGS: [ 65.141336] CS: 0010 DS: ES: CR0: 80050033 [ 65.141336] CR2: 88042b957e40 CR3: 0004298e6000 CR4: 000406e0 [ 65.141336] Stack: [ 65.141338] 880037b81000 880037b81000 81181e1e [ 65.141339] ff9c0002 880429e8c600 811782bf 0011 [ 65.141340] 049c 0001 1180 [ 65.141340] Call Trace: [ 65.141344] [] ? getname_flags+0x5e/0x1b0 [ 65.141346] [] ? cp_new_stat+0x10f/0x120 [ 65.141348] [] ? ktime_get_ts64+0x4a/0xf0 [ 65.141353] [] ? poll_select_copy_remaining+0xe7/0x130 [ 65.141355] [] exit_to_usermode_loop+0x8a/0xb0 [ 65.141356] [] syscall_return_slowpath+0x5b/0x70 [ 65.141358] [] entry_SYSCALL_64_fastpath+0xa5/0xa7 [ 65.141374] Code: 00 00 00 1e 1e 18 81 ff ff ff ff 02 00 00 00 9c ff ff ff 00 c6 e8 29 04 88 ff ff bf 82 17 81 ff ff ff ff 11 00 00 00 00 00 00 00 <9c> 04 00 00 00 00 00 00 01 00 00 00 00 00 00 00 80 11 00 00 00 [ 65.141375] RIP [] 0x88042b957e40 [ 65.141376] RSP [ 65.141376] CR2: 88042b957e40 [ 65.141378] ---[ end trace 5dc71ecf8d888ee6 ]--- [ 65.141509] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0009 [ 65.141509] [ 65.149191] Kernel Offset: disabled [ 65.449314] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0009 ... [ 381.835297] Restarting tasks ... [ 381.838620] kernel tried to execute NX-protected page - exploit attempt? (uid: 0) [ 381.838689] done. [ 381.850763] BUG: unable to handle kernel paging request at 88042b957e40 [ 381.850765] IP: [] 0x88042b957e40 [ 381.850766] PGD 2065067 PUD 2068067 PMD 80042b8001e3 [ 381.850767] Oops: 0011 [#1] PREEMPT SMP [ 381.850778] Modules linked in: binfmt_misc ipv6 vfat fat amd64_edac_mod edac_mce_amd fuse dm_crypt dm_mod amdkfd kvm_amd kvm amd_iommu_v2 radeon irqbypass crc32_pclmul aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd k10temp fam15h_power acpi_cpufreq [ 381.850779] CPU: 3 PID: 1 Comm: init Not tainted 4.7.0-rc3+ #1 [ 381.850780] Hardware name: To be filled by O.E.M. To be filled by O.E.M./M5A97 EVO R2.0, BIOS 1503 01/16/2013 [ 381.850781] task: 88042b958000 ti: 88042b954000 task.ti: 88042b954000 [ 381.850782] RIP: 0010:[] [] 0x88042b957e40 [ 381.850783] RSP: 0018:88042b957e00 EFLAGS: 00010282 [ 381.850783] RAX: RBX: 88042b957f58 RCX: [ 381.850784] RDX: 0001 RSI: 81062a2d RDI: 81687d8c [ 381.850784] RBP: 88042b957ef0 R08: R09: 0002 [ 381.850785] R10: R11: 0001 R12: 88042b954000 [ 381.850785] R13: 88042b954000 R14: 88042b957f58 R15: 88042b958000 [ 381.850786] FS: 7f1143649800() GS:88043dcc() knlGS: [ 381.850787] CS: 0010 DS: ES: CR0: 80050033 [ 381.850787] CR2: 88042b957e40 CR3: 0004298af000 CR4: 000406e0 [ 381.850788] Stack: [ 381.850789] 88042b1ed000 88042b1ed000 8117f8ae [ 381.850790] ff9c0002
Re: [PATCH v3] x86/power/64: Fix kernel text mapping corruption during image restoration
On 29/06/16 08:55 PM, Rafael J. Wysocki wrote: The only thing that comes to mind at this point is that TLBs should be flushed after page tables changes, so please apply the appended and let me know if you see this panic any more with it. Ok, I'll build a new kernel tomorrow. But keep in mind the panic is pretty rare as I've only seen it once so far after a couple dozen or so hibernates. So it may be hard to get a concrete yes or no on whether it fixes the issue. I've got a script to run a bunch of hibernates in a row. I usually only run it for a handful of iterations, but I'll try running it for much longer with this patch and let you know in a couple days. Logan Thanks, Rafael --- arch/x86/power/hibernate_64.c | 92 +- arch/x86/power/hibernate_asm_64.S | 55 +- 2 files changed, 104 insertions(+), 43 deletions(-) Index: linux-pm/arch/x86/power/hibernate_64.c === --- linux-pm.orig/arch/x86/power/hibernate_64.c +++ linux-pm/arch/x86/power/hibernate_64.c @@ -19,6 +19,7 @@ #include #include #include +#include /* Defined in hibernate_asm_64.S */ extern asmlinkage __visible int restore_image(void); @@ -28,6 +29,7 @@ extern asmlinkage __visible int restore_ * kernel's text (this value is passed in the image header). */ unsigned long restore_jump_address __visible; +unsigned long jump_address_phys; /* * Value of the cr3 register from before the hibernation (this value is passed @@ -37,7 +39,43 @@ unsigned long restore_cr3 __visible; pgd_t *temp_level4_pgt __visible; -void *relocated_restore_code __visible; +unsigned long relocated_restore_code __visible; + +static int set_up_temporary_text_mapping(void) +{ + pmd_t *pmd; + pud_t *pud; + + /* +* The new mapping only has to cover the page containing the image +* kernel's entry point (jump_address_phys), because the switch over to +* it is carried out by relocated code running from a page allocated +* specifically for this purpose and covered by the identity mapping, so +* the temporary kernel text mapping is only needed for the final jump. +* Moreover, in that mapping the virtual address of the image kernel's +* entry point must be the same as its virtual address in the image +* kernel (restore_jump_address), so the image kernel's +* restore_registers() code doesn't find itself in a different area of +* the virtual address space after switching over to the original page +* tables used by the image kernel. +*/ + pud = (pud_t *)get_safe_page(GFP_ATOMIC); + if (!pud) + return -ENOMEM; + + pmd = (pmd_t *)get_safe_page(GFP_ATOMIC); + if (!pmd) + return -ENOMEM; + + set_pmd(pmd + pmd_index(restore_jump_address), + __pmd((jump_address_phys & PMD_MASK) | __PAGE_KERNEL_LARGE_EXEC)); + set_pud(pud + pud_index(restore_jump_address), + __pud(__pa(pmd) | _KERNPG_TABLE)); + set_pgd(temp_level4_pgt + pgd_index(restore_jump_address), + __pgd(__pa(pud) | _KERNPG_TABLE)); + + return 0; +} static void *alloc_pgt_page(void *context) { @@ -59,9 +97,10 @@ static int set_up_temporary_mappings(voi if (!temp_level4_pgt) return -ENOMEM; - /* It is safe to reuse the original kernel mapping */ - set_pgd(temp_level4_pgt + pgd_index(__START_KERNEL_map), - init_level4_pgt[pgd_index(__START_KERNEL_map)]); + /* Prepare a temporary mapping for the kernel text */ + result = set_up_temporary_text_mapping(); + if (result) + return result; /* Set up the direct mapping from scratch */ for (i = 0; i < nr_pfn_mapped; i++) { @@ -78,19 +117,45 @@ static int set_up_temporary_mappings(voi return 0; } +static int relocate_restore_code(void) +{ + pgd_t *pgd; + pmd_t *pmd; + + relocated_restore_code = get_safe_page(GFP_ATOMIC); + if (!relocated_restore_code) + return -ENOMEM; + + memcpy((void *)relocated_restore_code, _restore_code, PAGE_SIZE); + + /* Make the page containing the relocated code executable */ + pgd = (pgd_t *)__va(read_cr3()) + pgd_index(relocated_restore_code); + pmd = pmd_offset(pud_offset(pgd, relocated_restore_code), +relocated_restore_code); + if (pmd_large(*pmd)) { + set_pmd(pmd, __pmd(pmd_val(*pmd) & ~_PAGE_NX)); + } else { + pte_t *pte = pte_offset_kernel(pmd, relocated_restore_code); + + set_pte(pte, __pte(pte_val(*pte) & ~_PAGE_NX)); + } + flush_tlb_all(); + + return 0; +} + int swsusp_arch_resume(void) { int error; /* We have got enough memory and from now on we cannot recover */ - if ((error =
Re: [PATCH v3] x86/power/64: Fix kernel text mapping corruption during image restoration
On 29/06/16 08:55 PM, Rafael J. Wysocki wrote: The only thing that comes to mind at this point is that TLBs should be flushed after page tables changes, so please apply the appended and let me know if you see this panic any more with it. Ok, I'll build a new kernel tomorrow. But keep in mind the panic is pretty rare as I've only seen it once so far after a couple dozen or so hibernates. So it may be hard to get a concrete yes or no on whether it fixes the issue. I've got a script to run a bunch of hibernates in a row. I usually only run it for a handful of iterations, but I'll try running it for much longer with this patch and let you know in a couple days. Logan Thanks, Rafael --- arch/x86/power/hibernate_64.c | 92 +- arch/x86/power/hibernate_asm_64.S | 55 +- 2 files changed, 104 insertions(+), 43 deletions(-) Index: linux-pm/arch/x86/power/hibernate_64.c === --- linux-pm.orig/arch/x86/power/hibernate_64.c +++ linux-pm/arch/x86/power/hibernate_64.c @@ -19,6 +19,7 @@ #include #include #include +#include /* Defined in hibernate_asm_64.S */ extern asmlinkage __visible int restore_image(void); @@ -28,6 +29,7 @@ extern asmlinkage __visible int restore_ * kernel's text (this value is passed in the image header). */ unsigned long restore_jump_address __visible; +unsigned long jump_address_phys; /* * Value of the cr3 register from before the hibernation (this value is passed @@ -37,7 +39,43 @@ unsigned long restore_cr3 __visible; pgd_t *temp_level4_pgt __visible; -void *relocated_restore_code __visible; +unsigned long relocated_restore_code __visible; + +static int set_up_temporary_text_mapping(void) +{ + pmd_t *pmd; + pud_t *pud; + + /* +* The new mapping only has to cover the page containing the image +* kernel's entry point (jump_address_phys), because the switch over to +* it is carried out by relocated code running from a page allocated +* specifically for this purpose and covered by the identity mapping, so +* the temporary kernel text mapping is only needed for the final jump. +* Moreover, in that mapping the virtual address of the image kernel's +* entry point must be the same as its virtual address in the image +* kernel (restore_jump_address), so the image kernel's +* restore_registers() code doesn't find itself in a different area of +* the virtual address space after switching over to the original page +* tables used by the image kernel. +*/ + pud = (pud_t *)get_safe_page(GFP_ATOMIC); + if (!pud) + return -ENOMEM; + + pmd = (pmd_t *)get_safe_page(GFP_ATOMIC); + if (!pmd) + return -ENOMEM; + + set_pmd(pmd + pmd_index(restore_jump_address), + __pmd((jump_address_phys & PMD_MASK) | __PAGE_KERNEL_LARGE_EXEC)); + set_pud(pud + pud_index(restore_jump_address), + __pud(__pa(pmd) | _KERNPG_TABLE)); + set_pgd(temp_level4_pgt + pgd_index(restore_jump_address), + __pgd(__pa(pud) | _KERNPG_TABLE)); + + return 0; +} static void *alloc_pgt_page(void *context) { @@ -59,9 +97,10 @@ static int set_up_temporary_mappings(voi if (!temp_level4_pgt) return -ENOMEM; - /* It is safe to reuse the original kernel mapping */ - set_pgd(temp_level4_pgt + pgd_index(__START_KERNEL_map), - init_level4_pgt[pgd_index(__START_KERNEL_map)]); + /* Prepare a temporary mapping for the kernel text */ + result = set_up_temporary_text_mapping(); + if (result) + return result; /* Set up the direct mapping from scratch */ for (i = 0; i < nr_pfn_mapped; i++) { @@ -78,19 +117,45 @@ static int set_up_temporary_mappings(voi return 0; } +static int relocate_restore_code(void) +{ + pgd_t *pgd; + pmd_t *pmd; + + relocated_restore_code = get_safe_page(GFP_ATOMIC); + if (!relocated_restore_code) + return -ENOMEM; + + memcpy((void *)relocated_restore_code, _restore_code, PAGE_SIZE); + + /* Make the page containing the relocated code executable */ + pgd = (pgd_t *)__va(read_cr3()) + pgd_index(relocated_restore_code); + pmd = pmd_offset(pud_offset(pgd, relocated_restore_code), +relocated_restore_code); + if (pmd_large(*pmd)) { + set_pmd(pmd, __pmd(pmd_val(*pmd) & ~_PAGE_NX)); + } else { + pte_t *pte = pte_offset_kernel(pmd, relocated_restore_code); + + set_pte(pte, __pte(pte_val(*pte) & ~_PAGE_NX)); + } + flush_tlb_all(); + + return 0; +} + int swsusp_arch_resume(void) { int error; /* We have got enough memory and from now on we cannot recover */ - if ((error =
Re: [PATCH v3] x86/power/64: Fix kernel text mapping corruption during image restoration
On Thursday, June 30, 2016 04:20:43 AM Rafael J. Wysocki wrote: > On Wednesday, June 29, 2016 07:52:18 PM Logan Gunthorpe wrote: > > Hey Raf, > > > > Sorry to report that although the patch works the majority of the time, > > I just got a suspicious kernel panic during resume. > > > > It said: > > > > "kernel tried to execute NX protected page - exploit attempt? (uid: 0)" > > > > You can find a photo of the panic here: > > > > http://staff.deltatee.com/~logang/panic.jpg > > Thanks for the report! > > That's not what Boris was seeing at least. > > It looks like clearing the NX bit for relocated_restore_code in > relocate_restore_code() didn't work for some reason. > > I don't see why it may not work ATM, I need to have a fresh look at that > tomorrow. > > I had hoped to be able to fix this bug for 4.7, but it looks like it will > miss the mark after all. Oh well. The only thing that comes to mind at this point is that TLBs should be flushed after page tables changes, so please apply the appended and let me know if you see this panic any more with it. Thanks, Rafael --- arch/x86/power/hibernate_64.c | 92 +- arch/x86/power/hibernate_asm_64.S | 55 +- 2 files changed, 104 insertions(+), 43 deletions(-) Index: linux-pm/arch/x86/power/hibernate_64.c === --- linux-pm.orig/arch/x86/power/hibernate_64.c +++ linux-pm/arch/x86/power/hibernate_64.c @@ -19,6 +19,7 @@ #include #include #include +#include /* Defined in hibernate_asm_64.S */ extern asmlinkage __visible int restore_image(void); @@ -28,6 +29,7 @@ extern asmlinkage __visible int restore_ * kernel's text (this value is passed in the image header). */ unsigned long restore_jump_address __visible; +unsigned long jump_address_phys; /* * Value of the cr3 register from before the hibernation (this value is passed @@ -37,7 +39,43 @@ unsigned long restore_cr3 __visible; pgd_t *temp_level4_pgt __visible; -void *relocated_restore_code __visible; +unsigned long relocated_restore_code __visible; + +static int set_up_temporary_text_mapping(void) +{ + pmd_t *pmd; + pud_t *pud; + + /* +* The new mapping only has to cover the page containing the image +* kernel's entry point (jump_address_phys), because the switch over to +* it is carried out by relocated code running from a page allocated +* specifically for this purpose and covered by the identity mapping, so +* the temporary kernel text mapping is only needed for the final jump. +* Moreover, in that mapping the virtual address of the image kernel's +* entry point must be the same as its virtual address in the image +* kernel (restore_jump_address), so the image kernel's +* restore_registers() code doesn't find itself in a different area of +* the virtual address space after switching over to the original page +* tables used by the image kernel. +*/ + pud = (pud_t *)get_safe_page(GFP_ATOMIC); + if (!pud) + return -ENOMEM; + + pmd = (pmd_t *)get_safe_page(GFP_ATOMIC); + if (!pmd) + return -ENOMEM; + + set_pmd(pmd + pmd_index(restore_jump_address), + __pmd((jump_address_phys & PMD_MASK) | __PAGE_KERNEL_LARGE_EXEC)); + set_pud(pud + pud_index(restore_jump_address), + __pud(__pa(pmd) | _KERNPG_TABLE)); + set_pgd(temp_level4_pgt + pgd_index(restore_jump_address), + __pgd(__pa(pud) | _KERNPG_TABLE)); + + return 0; +} static void *alloc_pgt_page(void *context) { @@ -59,9 +97,10 @@ static int set_up_temporary_mappings(voi if (!temp_level4_pgt) return -ENOMEM; - /* It is safe to reuse the original kernel mapping */ - set_pgd(temp_level4_pgt + pgd_index(__START_KERNEL_map), - init_level4_pgt[pgd_index(__START_KERNEL_map)]); + /* Prepare a temporary mapping for the kernel text */ + result = set_up_temporary_text_mapping(); + if (result) + return result; /* Set up the direct mapping from scratch */ for (i = 0; i < nr_pfn_mapped; i++) { @@ -78,19 +117,45 @@ static int set_up_temporary_mappings(voi return 0; } +static int relocate_restore_code(void) +{ + pgd_t *pgd; + pmd_t *pmd; + + relocated_restore_code = get_safe_page(GFP_ATOMIC); + if (!relocated_restore_code) + return -ENOMEM; + + memcpy((void *)relocated_restore_code, _restore_code, PAGE_SIZE); + + /* Make the page containing the relocated code executable */ + pgd = (pgd_t *)__va(read_cr3()) + pgd_index(relocated_restore_code); + pmd = pmd_offset(pud_offset(pgd, relocated_restore_code), +relocated_restore_code); + if (pmd_large(*pmd)) { + set_pmd(pmd,
Re: [PATCH v3] x86/power/64: Fix kernel text mapping corruption during image restoration
On Thursday, June 30, 2016 04:20:43 AM Rafael J. Wysocki wrote: > On Wednesday, June 29, 2016 07:52:18 PM Logan Gunthorpe wrote: > > Hey Raf, > > > > Sorry to report that although the patch works the majority of the time, > > I just got a suspicious kernel panic during resume. > > > > It said: > > > > "kernel tried to execute NX protected page - exploit attempt? (uid: 0)" > > > > You can find a photo of the panic here: > > > > http://staff.deltatee.com/~logang/panic.jpg > > Thanks for the report! > > That's not what Boris was seeing at least. > > It looks like clearing the NX bit for relocated_restore_code in > relocate_restore_code() didn't work for some reason. > > I don't see why it may not work ATM, I need to have a fresh look at that > tomorrow. > > I had hoped to be able to fix this bug for 4.7, but it looks like it will > miss the mark after all. Oh well. The only thing that comes to mind at this point is that TLBs should be flushed after page tables changes, so please apply the appended and let me know if you see this panic any more with it. Thanks, Rafael --- arch/x86/power/hibernate_64.c | 92 +- arch/x86/power/hibernate_asm_64.S | 55 +- 2 files changed, 104 insertions(+), 43 deletions(-) Index: linux-pm/arch/x86/power/hibernate_64.c === --- linux-pm.orig/arch/x86/power/hibernate_64.c +++ linux-pm/arch/x86/power/hibernate_64.c @@ -19,6 +19,7 @@ #include #include #include +#include /* Defined in hibernate_asm_64.S */ extern asmlinkage __visible int restore_image(void); @@ -28,6 +29,7 @@ extern asmlinkage __visible int restore_ * kernel's text (this value is passed in the image header). */ unsigned long restore_jump_address __visible; +unsigned long jump_address_phys; /* * Value of the cr3 register from before the hibernation (this value is passed @@ -37,7 +39,43 @@ unsigned long restore_cr3 __visible; pgd_t *temp_level4_pgt __visible; -void *relocated_restore_code __visible; +unsigned long relocated_restore_code __visible; + +static int set_up_temporary_text_mapping(void) +{ + pmd_t *pmd; + pud_t *pud; + + /* +* The new mapping only has to cover the page containing the image +* kernel's entry point (jump_address_phys), because the switch over to +* it is carried out by relocated code running from a page allocated +* specifically for this purpose and covered by the identity mapping, so +* the temporary kernel text mapping is only needed for the final jump. +* Moreover, in that mapping the virtual address of the image kernel's +* entry point must be the same as its virtual address in the image +* kernel (restore_jump_address), so the image kernel's +* restore_registers() code doesn't find itself in a different area of +* the virtual address space after switching over to the original page +* tables used by the image kernel. +*/ + pud = (pud_t *)get_safe_page(GFP_ATOMIC); + if (!pud) + return -ENOMEM; + + pmd = (pmd_t *)get_safe_page(GFP_ATOMIC); + if (!pmd) + return -ENOMEM; + + set_pmd(pmd + pmd_index(restore_jump_address), + __pmd((jump_address_phys & PMD_MASK) | __PAGE_KERNEL_LARGE_EXEC)); + set_pud(pud + pud_index(restore_jump_address), + __pud(__pa(pmd) | _KERNPG_TABLE)); + set_pgd(temp_level4_pgt + pgd_index(restore_jump_address), + __pgd(__pa(pud) | _KERNPG_TABLE)); + + return 0; +} static void *alloc_pgt_page(void *context) { @@ -59,9 +97,10 @@ static int set_up_temporary_mappings(voi if (!temp_level4_pgt) return -ENOMEM; - /* It is safe to reuse the original kernel mapping */ - set_pgd(temp_level4_pgt + pgd_index(__START_KERNEL_map), - init_level4_pgt[pgd_index(__START_KERNEL_map)]); + /* Prepare a temporary mapping for the kernel text */ + result = set_up_temporary_text_mapping(); + if (result) + return result; /* Set up the direct mapping from scratch */ for (i = 0; i < nr_pfn_mapped; i++) { @@ -78,19 +117,45 @@ static int set_up_temporary_mappings(voi return 0; } +static int relocate_restore_code(void) +{ + pgd_t *pgd; + pmd_t *pmd; + + relocated_restore_code = get_safe_page(GFP_ATOMIC); + if (!relocated_restore_code) + return -ENOMEM; + + memcpy((void *)relocated_restore_code, _restore_code, PAGE_SIZE); + + /* Make the page containing the relocated code executable */ + pgd = (pgd_t *)__va(read_cr3()) + pgd_index(relocated_restore_code); + pmd = pmd_offset(pud_offset(pgd, relocated_restore_code), +relocated_restore_code); + if (pmd_large(*pmd)) { + set_pmd(pmd,
Re: [PATCH v3] x86/power/64: Fix kernel text mapping corruption during image restoration
On Wednesday, June 29, 2016 07:52:18 PM Logan Gunthorpe wrote: > Hey Raf, > > Sorry to report that although the patch works the majority of the time, > I just got a suspicious kernel panic during resume. > > It said: > > "kernel tried to execute NX protected page - exploit attempt? (uid: 0)" > > You can find a photo of the panic here: > > http://staff.deltatee.com/~logang/panic.jpg Thanks for the report! That's not what Boris was seeing at least. It looks like clearing the NX bit for relocated_restore_code in relocate_restore_code() didn't work for some reason. I don't see why it may not work ATM, I need to have a fresh look at that tomorrow. I had hoped to be able to fix this bug for 4.7, but it looks like it will miss the mark after all. Oh well. Thanks, Rafael
Re: [PATCH v3] x86/power/64: Fix kernel text mapping corruption during image restoration
On Wednesday, June 29, 2016 07:52:18 PM Logan Gunthorpe wrote: > Hey Raf, > > Sorry to report that although the patch works the majority of the time, > I just got a suspicious kernel panic during resume. > > It said: > > "kernel tried to execute NX protected page - exploit attempt? (uid: 0)" > > You can find a photo of the panic here: > > http://staff.deltatee.com/~logang/panic.jpg Thanks for the report! That's not what Boris was seeing at least. It looks like clearing the NX bit for relocated_restore_code in relocate_restore_code() didn't work for some reason. I don't see why it may not work ATM, I need to have a fresh look at that tomorrow. I had hoped to be able to fix this bug for 4.7, but it looks like it will miss the mark after all. Oh well. Thanks, Rafael
Re: [PATCH v3] x86/power/64: Fix kernel text mapping corruption during image restoration
Hey Raf, Sorry to report that although the patch works the majority of the time, I just got a suspicious kernel panic during resume. It said: "kernel tried to execute NX protected page - exploit attempt? (uid: 0)" You can find a photo of the panic here: http://staff.deltatee.com/~logang/panic.jpg Logan On 29/06/16 08:48 AM, Kees Cook wrote: On Mon, Jun 27, 2016 at 4:33 PM, Logan Gunthorpewrote: Hey, This version is working for me as well. Thanks. Logan On 27/06/16 08:24 AM, Rafael J. Wysocki wrote: On Tuesday, June 21, 2016 11:04:41 AM Kees Cook wrote: On Mon, Jun 20, 2016 at 9:35 PM, Logan Gunthorpe wrote: Hey Rafael, This patch appears to be working on my laptop. Thanks. Same for me: resume still works with KASLR in my tests too. Unfortunately, Boris still sees post-resume memory corruption with the patch you have tested, but that turns out to be a result of some super-weird corruption of a pointer on the stack which leads to a store at a wrong address (and there's no way it can be related to the rest of the patch). We have verified that it can be avoided by rearranging set_up_temporary_text_mapping() to use fewer local variables and the appended patch contains this modification. I also went on and changed relocate_restore_code() slightly in a similar fashion, but all of those changes should not affect the behavior (unless there's something insane going on behind the curtains, that is). Kees, Logan, Boris, please try this one and let me know if it works for you. Tested-by: Kees Cook I've done a few KASLR boots, and everything continues to look good to me. Thanks! -Kees Thanks, Rafael --- From: Rafael J. Wysocki Subject: [PATCH v2] x86/power/64: Fix kernel text mapping corruption during image restoration Logan Gunthorpe reports that hibernation stopped working reliably for him after commit ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table and rodata). That turns out to be a consequence of a long-standing issue with the 64-bit image restoration code on x86, which is that the temporary page tables set up by it to avoid page tables corruption when the last bits of the image kernel's memory contents are copied into their original page frames re-use the boot kernel's text mapping, but that mapping may very well get corrupted just like any other part of the page tables. Of course, if that happens, the final jump to the image kernel's entry point will go to nowhere. The exact reason why commit ab76f7b4ab23 matters here is that it sometimes causes a PMD of a large page to be split into PTEs that are allocated dynamically and get corrupted during image restoration as described above. To fix that issue note that the code copying the last bits of the image kernel's memory contents to the page frames occupied by them previoulsy doesn't use the kernel text mapping, because it runs from a special page covered by the identity mapping set up for that code from scratch. Hence, the kernel text mapping is only needed before that code starts to run and then it will only be used just for the final jump to the image kernel's entry point. Accordingly, the temporary page tables set up in swsusp_arch_resume() on x86-64 need to contain the kernel text mapping too. That mapping is only going to be used for the final jump to the image kernel, so it only needs to cover the image kernel's entry point, because the first thing the image kernel does after getting control back is to switch over to its own original page tables. Moreover, the virtual address of the image kernel's entry point in that mapping has to be the same as the one mapped by the image kernel's page tables. With that in mind, modify the x86-64's arch_hibernation_header_save() and arch_hibernation_header_restore() routines to pass the physical address of the image kernel's entry point (in addition to its virtual address) to the boot kernel (a small piece of assembly code involved in passing the entry point's virtual address to the image kernel is not necessary any more after that, so drop it). Update RESTORE_MAGIC too to reflect the image header format change. Next, in set_up_temporary_mappings(), use the physical and virtual addresses of the image kernel's entry point passed in the image header to set up a minimum kernel text mapping (using memory pages that won't be overwritten by the image kernel's memory contents) that will map those addresses to each other as appropriate. This makes the concern about the possible corruption of the original boot kernel text mapping go away and if the the minimum kernel text mapping used for the final jump marks the image kernel's entry point memory as executable, the jump to it is guaraneed to succeed. Fixes: ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table and rodata) Link: http://marc.info/?l=linux-pm=146372852823760=2 Reported-by: Logan Gunthorpe
Re: [PATCH v3] x86/power/64: Fix kernel text mapping corruption during image restoration
Hey Raf, Sorry to report that although the patch works the majority of the time, I just got a suspicious kernel panic during resume. It said: "kernel tried to execute NX protected page - exploit attempt? (uid: 0)" You can find a photo of the panic here: http://staff.deltatee.com/~logang/panic.jpg Logan On 29/06/16 08:48 AM, Kees Cook wrote: On Mon, Jun 27, 2016 at 4:33 PM, Logan Gunthorpe wrote: Hey, This version is working for me as well. Thanks. Logan On 27/06/16 08:24 AM, Rafael J. Wysocki wrote: On Tuesday, June 21, 2016 11:04:41 AM Kees Cook wrote: On Mon, Jun 20, 2016 at 9:35 PM, Logan Gunthorpe wrote: Hey Rafael, This patch appears to be working on my laptop. Thanks. Same for me: resume still works with KASLR in my tests too. Unfortunately, Boris still sees post-resume memory corruption with the patch you have tested, but that turns out to be a result of some super-weird corruption of a pointer on the stack which leads to a store at a wrong address (and there's no way it can be related to the rest of the patch). We have verified that it can be avoided by rearranging set_up_temporary_text_mapping() to use fewer local variables and the appended patch contains this modification. I also went on and changed relocate_restore_code() slightly in a similar fashion, but all of those changes should not affect the behavior (unless there's something insane going on behind the curtains, that is). Kees, Logan, Boris, please try this one and let me know if it works for you. Tested-by: Kees Cook I've done a few KASLR boots, and everything continues to look good to me. Thanks! -Kees Thanks, Rafael --- From: Rafael J. Wysocki Subject: [PATCH v2] x86/power/64: Fix kernel text mapping corruption during image restoration Logan Gunthorpe reports that hibernation stopped working reliably for him after commit ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table and rodata). That turns out to be a consequence of a long-standing issue with the 64-bit image restoration code on x86, which is that the temporary page tables set up by it to avoid page tables corruption when the last bits of the image kernel's memory contents are copied into their original page frames re-use the boot kernel's text mapping, but that mapping may very well get corrupted just like any other part of the page tables. Of course, if that happens, the final jump to the image kernel's entry point will go to nowhere. The exact reason why commit ab76f7b4ab23 matters here is that it sometimes causes a PMD of a large page to be split into PTEs that are allocated dynamically and get corrupted during image restoration as described above. To fix that issue note that the code copying the last bits of the image kernel's memory contents to the page frames occupied by them previoulsy doesn't use the kernel text mapping, because it runs from a special page covered by the identity mapping set up for that code from scratch. Hence, the kernel text mapping is only needed before that code starts to run and then it will only be used just for the final jump to the image kernel's entry point. Accordingly, the temporary page tables set up in swsusp_arch_resume() on x86-64 need to contain the kernel text mapping too. That mapping is only going to be used for the final jump to the image kernel, so it only needs to cover the image kernel's entry point, because the first thing the image kernel does after getting control back is to switch over to its own original page tables. Moreover, the virtual address of the image kernel's entry point in that mapping has to be the same as the one mapped by the image kernel's page tables. With that in mind, modify the x86-64's arch_hibernation_header_save() and arch_hibernation_header_restore() routines to pass the physical address of the image kernel's entry point (in addition to its virtual address) to the boot kernel (a small piece of assembly code involved in passing the entry point's virtual address to the image kernel is not necessary any more after that, so drop it). Update RESTORE_MAGIC too to reflect the image header format change. Next, in set_up_temporary_mappings(), use the physical and virtual addresses of the image kernel's entry point passed in the image header to set up a minimum kernel text mapping (using memory pages that won't be overwritten by the image kernel's memory contents) that will map those addresses to each other as appropriate. This makes the concern about the possible corruption of the original boot kernel text mapping go away and if the the minimum kernel text mapping used for the final jump marks the image kernel's entry point memory as executable, the jump to it is guaraneed to succeed. Fixes: ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table and rodata) Link: http://marc.info/?l=linux-pm=146372852823760=2 Reported-by: Logan Gunthorpe Signed-off-by: Rafael J. Wysocki --- arch/x86/power/hibernate_64.c | 90
Re: [PATCH v3] x86/power/64: Fix kernel text mapping corruption during image restoration
On Mon, Jun 27, 2016 at 4:33 PM, Logan Gunthorpewrote: > Hey, > > This version is working for me as well. Thanks. > > Logan > > On 27/06/16 08:24 AM, Rafael J. Wysocki wrote: >> >> On Tuesday, June 21, 2016 11:04:41 AM Kees Cook wrote: >>> >>> On Mon, Jun 20, 2016 at 9:35 PM, Logan Gunthorpe >>> wrote: Hey Rafael, This patch appears to be working on my laptop. Thanks. >>> >>> >>> Same for me: resume still works with KASLR in my tests too. >> >> >> Unfortunately, Boris still sees post-resume memory corruption with the >> patch >> you have tested, but that turns out to be a result of some super-weird >> corruption of a pointer on the stack which leads to a store at a wrong >> address >> (and there's no way it can be related to the rest of the patch). >> >> We have verified that it can be avoided by rearranging >> set_up_temporary_text_mapping() >> to use fewer local variables and the appended patch contains this >> modification. >> >> I also went on and changed relocate_restore_code() slightly in a similar >> fashion, >> but all of those changes should not affect the behavior (unless there's >> something >> insane going on behind the curtains, that is). >> >> Kees, Logan, Boris, please try this one and let me know if it works for >> you. Tested-by: Kees Cook I've done a few KASLR boots, and everything continues to look good to me. Thanks! -Kees >> >> Thanks, >> Rafael >> >> >> --- >> From: Rafael J. Wysocki >> Subject: [PATCH v2] x86/power/64: Fix kernel text mapping corruption >> during image restoration >> >> Logan Gunthorpe reports that hibernation stopped working reliably for >> him after commit ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table >> and rodata). >> >> That turns out to be a consequence of a long-standing issue with the >> 64-bit image restoration code on x86, which is that the temporary >> page tables set up by it to avoid page tables corruption when the >> last bits of the image kernel's memory contents are copied into >> their original page frames re-use the boot kernel's text mapping, >> but that mapping may very well get corrupted just like any other >> part of the page tables. Of course, if that happens, the final >> jump to the image kernel's entry point will go to nowhere. >> >> The exact reason why commit ab76f7b4ab23 matters here is that it >> sometimes causes a PMD of a large page to be split into PTEs >> that are allocated dynamically and get corrupted during image >> restoration as described above. >> >> To fix that issue note that the code copying the last bits of the >> image kernel's memory contents to the page frames occupied by them >> previoulsy doesn't use the kernel text mapping, because it runs from >> a special page covered by the identity mapping set up for that code >> from scratch. Hence, the kernel text mapping is only needed before >> that code starts to run and then it will only be used just for the >> final jump to the image kernel's entry point. >> >> Accordingly, the temporary page tables set up in swsusp_arch_resume() >> on x86-64 need to contain the kernel text mapping too. That mapping >> is only going to be used for the final jump to the image kernel, so >> it only needs to cover the image kernel's entry point, because the >> first thing the image kernel does after getting control back is to >> switch over to its own original page tables. Moreover, the virtual >> address of the image kernel's entry point in that mapping has to be >> the same as the one mapped by the image kernel's page tables. >> >> With that in mind, modify the x86-64's arch_hibernation_header_save() >> and arch_hibernation_header_restore() routines to pass the physical >> address of the image kernel's entry point (in addition to its virtual >> address) to the boot kernel (a small piece of assembly code involved >> in passing the entry point's virtual address to the image kernel is >> not necessary any more after that, so drop it). Update RESTORE_MAGIC >> too to reflect the image header format change. >> >> Next, in set_up_temporary_mappings(), use the physical and virtual >> addresses of the image kernel's entry point passed in the image >> header to set up a minimum kernel text mapping (using memory pages >> that won't be overwritten by the image kernel's memory contents) that >> will map those addresses to each other as appropriate. >> >> This makes the concern about the possible corruption of the original >> boot kernel text mapping go away and if the the minimum kernel text >> mapping used for the final jump marks the image kernel's entry point >> memory as executable, the jump to it is guaraneed to succeed. >> >> Fixes: ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table and rodata) >> Link: http://marc.info/?l=linux-pm=146372852823760=2 >> Reported-by: Logan Gunthorpe >> Signed-off-by: Rafael J. Wysocki
Re: [PATCH v3] x86/power/64: Fix kernel text mapping corruption during image restoration
On Mon, Jun 27, 2016 at 4:33 PM, Logan Gunthorpe wrote: > Hey, > > This version is working for me as well. Thanks. > > Logan > > On 27/06/16 08:24 AM, Rafael J. Wysocki wrote: >> >> On Tuesday, June 21, 2016 11:04:41 AM Kees Cook wrote: >>> >>> On Mon, Jun 20, 2016 at 9:35 PM, Logan Gunthorpe >>> wrote: Hey Rafael, This patch appears to be working on my laptop. Thanks. >>> >>> >>> Same for me: resume still works with KASLR in my tests too. >> >> >> Unfortunately, Boris still sees post-resume memory corruption with the >> patch >> you have tested, but that turns out to be a result of some super-weird >> corruption of a pointer on the stack which leads to a store at a wrong >> address >> (and there's no way it can be related to the rest of the patch). >> >> We have verified that it can be avoided by rearranging >> set_up_temporary_text_mapping() >> to use fewer local variables and the appended patch contains this >> modification. >> >> I also went on and changed relocate_restore_code() slightly in a similar >> fashion, >> but all of those changes should not affect the behavior (unless there's >> something >> insane going on behind the curtains, that is). >> >> Kees, Logan, Boris, please try this one and let me know if it works for >> you. Tested-by: Kees Cook I've done a few KASLR boots, and everything continues to look good to me. Thanks! -Kees >> >> Thanks, >> Rafael >> >> >> --- >> From: Rafael J. Wysocki >> Subject: [PATCH v2] x86/power/64: Fix kernel text mapping corruption >> during image restoration >> >> Logan Gunthorpe reports that hibernation stopped working reliably for >> him after commit ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table >> and rodata). >> >> That turns out to be a consequence of a long-standing issue with the >> 64-bit image restoration code on x86, which is that the temporary >> page tables set up by it to avoid page tables corruption when the >> last bits of the image kernel's memory contents are copied into >> their original page frames re-use the boot kernel's text mapping, >> but that mapping may very well get corrupted just like any other >> part of the page tables. Of course, if that happens, the final >> jump to the image kernel's entry point will go to nowhere. >> >> The exact reason why commit ab76f7b4ab23 matters here is that it >> sometimes causes a PMD of a large page to be split into PTEs >> that are allocated dynamically and get corrupted during image >> restoration as described above. >> >> To fix that issue note that the code copying the last bits of the >> image kernel's memory contents to the page frames occupied by them >> previoulsy doesn't use the kernel text mapping, because it runs from >> a special page covered by the identity mapping set up for that code >> from scratch. Hence, the kernel text mapping is only needed before >> that code starts to run and then it will only be used just for the >> final jump to the image kernel's entry point. >> >> Accordingly, the temporary page tables set up in swsusp_arch_resume() >> on x86-64 need to contain the kernel text mapping too. That mapping >> is only going to be used for the final jump to the image kernel, so >> it only needs to cover the image kernel's entry point, because the >> first thing the image kernel does after getting control back is to >> switch over to its own original page tables. Moreover, the virtual >> address of the image kernel's entry point in that mapping has to be >> the same as the one mapped by the image kernel's page tables. >> >> With that in mind, modify the x86-64's arch_hibernation_header_save() >> and arch_hibernation_header_restore() routines to pass the physical >> address of the image kernel's entry point (in addition to its virtual >> address) to the boot kernel (a small piece of assembly code involved >> in passing the entry point's virtual address to the image kernel is >> not necessary any more after that, so drop it). Update RESTORE_MAGIC >> too to reflect the image header format change. >> >> Next, in set_up_temporary_mappings(), use the physical and virtual >> addresses of the image kernel's entry point passed in the image >> header to set up a minimum kernel text mapping (using memory pages >> that won't be overwritten by the image kernel's memory contents) that >> will map those addresses to each other as appropriate. >> >> This makes the concern about the possible corruption of the original >> boot kernel text mapping go away and if the the minimum kernel text >> mapping used for the final jump marks the image kernel's entry point >> memory as executable, the jump to it is guaraneed to succeed. >> >> Fixes: ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table and rodata) >> Link: http://marc.info/?l=linux-pm=146372852823760=2 >> Reported-by: Logan Gunthorpe >> Signed-off-by: Rafael J. Wysocki >> --- >> arch/x86/power/hibernate_64.c | 90 >> -- >> arch/x86/power/hibernate_asm_64.S |
Re: [PATCH v3] x86/power/64: Fix kernel text mapping corruption during image restoration
Hey, This version is working for me as well. Thanks. Logan On 27/06/16 08:24 AM, Rafael J. Wysocki wrote: On Tuesday, June 21, 2016 11:04:41 AM Kees Cook wrote: On Mon, Jun 20, 2016 at 9:35 PM, Logan Gunthorpewrote: Hey Rafael, This patch appears to be working on my laptop. Thanks. Same for me: resume still works with KASLR in my tests too. Unfortunately, Boris still sees post-resume memory corruption with the patch you have tested, but that turns out to be a result of some super-weird corruption of a pointer on the stack which leads to a store at a wrong address (and there's no way it can be related to the rest of the patch). We have verified that it can be avoided by rearranging set_up_temporary_text_mapping() to use fewer local variables and the appended patch contains this modification. I also went on and changed relocate_restore_code() slightly in a similar fashion, but all of those changes should not affect the behavior (unless there's something insane going on behind the curtains, that is). Kees, Logan, Boris, please try this one and let me know if it works for you. Thanks, Rafael --- From: Rafael J. Wysocki Subject: [PATCH v2] x86/power/64: Fix kernel text mapping corruption during image restoration Logan Gunthorpe reports that hibernation stopped working reliably for him after commit ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table and rodata). That turns out to be a consequence of a long-standing issue with the 64-bit image restoration code on x86, which is that the temporary page tables set up by it to avoid page tables corruption when the last bits of the image kernel's memory contents are copied into their original page frames re-use the boot kernel's text mapping, but that mapping may very well get corrupted just like any other part of the page tables. Of course, if that happens, the final jump to the image kernel's entry point will go to nowhere. The exact reason why commit ab76f7b4ab23 matters here is that it sometimes causes a PMD of a large page to be split into PTEs that are allocated dynamically and get corrupted during image restoration as described above. To fix that issue note that the code copying the last bits of the image kernel's memory contents to the page frames occupied by them previoulsy doesn't use the kernel text mapping, because it runs from a special page covered by the identity mapping set up for that code from scratch. Hence, the kernel text mapping is only needed before that code starts to run and then it will only be used just for the final jump to the image kernel's entry point. Accordingly, the temporary page tables set up in swsusp_arch_resume() on x86-64 need to contain the kernel text mapping too. That mapping is only going to be used for the final jump to the image kernel, so it only needs to cover the image kernel's entry point, because the first thing the image kernel does after getting control back is to switch over to its own original page tables. Moreover, the virtual address of the image kernel's entry point in that mapping has to be the same as the one mapped by the image kernel's page tables. With that in mind, modify the x86-64's arch_hibernation_header_save() and arch_hibernation_header_restore() routines to pass the physical address of the image kernel's entry point (in addition to its virtual address) to the boot kernel (a small piece of assembly code involved in passing the entry point's virtual address to the image kernel is not necessary any more after that, so drop it). Update RESTORE_MAGIC too to reflect the image header format change. Next, in set_up_temporary_mappings(), use the physical and virtual addresses of the image kernel's entry point passed in the image header to set up a minimum kernel text mapping (using memory pages that won't be overwritten by the image kernel's memory contents) that will map those addresses to each other as appropriate. This makes the concern about the possible corruption of the original boot kernel text mapping go away and if the the minimum kernel text mapping used for the final jump marks the image kernel's entry point memory as executable, the jump to it is guaraneed to succeed. Fixes: ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table and rodata) Link: http://marc.info/?l=linux-pm=146372852823760=2 Reported-by: Logan Gunthorpe Signed-off-by: Rafael J. Wysocki --- arch/x86/power/hibernate_64.c | 90 -- arch/x86/power/hibernate_asm_64.S | 55 ++- 2 files changed, 102 insertions(+), 43 deletions(-) Index: linux-pm/arch/x86/power/hibernate_64.c === --- linux-pm.orig/arch/x86/power/hibernate_64.c +++ linux-pm/arch/x86/power/hibernate_64.c @@ -28,6 +28,7 @@ extern asmlinkage __visible int restore_ * kernel's text (this value is passed in
Re: [PATCH v3] x86/power/64: Fix kernel text mapping corruption during image restoration
Hey, This version is working for me as well. Thanks. Logan On 27/06/16 08:24 AM, Rafael J. Wysocki wrote: On Tuesday, June 21, 2016 11:04:41 AM Kees Cook wrote: On Mon, Jun 20, 2016 at 9:35 PM, Logan Gunthorpe wrote: Hey Rafael, This patch appears to be working on my laptop. Thanks. Same for me: resume still works with KASLR in my tests too. Unfortunately, Boris still sees post-resume memory corruption with the patch you have tested, but that turns out to be a result of some super-weird corruption of a pointer on the stack which leads to a store at a wrong address (and there's no way it can be related to the rest of the patch). We have verified that it can be avoided by rearranging set_up_temporary_text_mapping() to use fewer local variables and the appended patch contains this modification. I also went on and changed relocate_restore_code() slightly in a similar fashion, but all of those changes should not affect the behavior (unless there's something insane going on behind the curtains, that is). Kees, Logan, Boris, please try this one and let me know if it works for you. Thanks, Rafael --- From: Rafael J. Wysocki Subject: [PATCH v2] x86/power/64: Fix kernel text mapping corruption during image restoration Logan Gunthorpe reports that hibernation stopped working reliably for him after commit ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table and rodata). That turns out to be a consequence of a long-standing issue with the 64-bit image restoration code on x86, which is that the temporary page tables set up by it to avoid page tables corruption when the last bits of the image kernel's memory contents are copied into their original page frames re-use the boot kernel's text mapping, but that mapping may very well get corrupted just like any other part of the page tables. Of course, if that happens, the final jump to the image kernel's entry point will go to nowhere. The exact reason why commit ab76f7b4ab23 matters here is that it sometimes causes a PMD of a large page to be split into PTEs that are allocated dynamically and get corrupted during image restoration as described above. To fix that issue note that the code copying the last bits of the image kernel's memory contents to the page frames occupied by them previoulsy doesn't use the kernel text mapping, because it runs from a special page covered by the identity mapping set up for that code from scratch. Hence, the kernel text mapping is only needed before that code starts to run and then it will only be used just for the final jump to the image kernel's entry point. Accordingly, the temporary page tables set up in swsusp_arch_resume() on x86-64 need to contain the kernel text mapping too. That mapping is only going to be used for the final jump to the image kernel, so it only needs to cover the image kernel's entry point, because the first thing the image kernel does after getting control back is to switch over to its own original page tables. Moreover, the virtual address of the image kernel's entry point in that mapping has to be the same as the one mapped by the image kernel's page tables. With that in mind, modify the x86-64's arch_hibernation_header_save() and arch_hibernation_header_restore() routines to pass the physical address of the image kernel's entry point (in addition to its virtual address) to the boot kernel (a small piece of assembly code involved in passing the entry point's virtual address to the image kernel is not necessary any more after that, so drop it). Update RESTORE_MAGIC too to reflect the image header format change. Next, in set_up_temporary_mappings(), use the physical and virtual addresses of the image kernel's entry point passed in the image header to set up a minimum kernel text mapping (using memory pages that won't be overwritten by the image kernel's memory contents) that will map those addresses to each other as appropriate. This makes the concern about the possible corruption of the original boot kernel text mapping go away and if the the minimum kernel text mapping used for the final jump marks the image kernel's entry point memory as executable, the jump to it is guaraneed to succeed. Fixes: ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table and rodata) Link: http://marc.info/?l=linux-pm=146372852823760=2 Reported-by: Logan Gunthorpe Signed-off-by: Rafael J. Wysocki --- arch/x86/power/hibernate_64.c | 90 -- arch/x86/power/hibernate_asm_64.S | 55 ++- 2 files changed, 102 insertions(+), 43 deletions(-) Index: linux-pm/arch/x86/power/hibernate_64.c === --- linux-pm.orig/arch/x86/power/hibernate_64.c +++ linux-pm/arch/x86/power/hibernate_64.c @@ -28,6 +28,7 @@ extern asmlinkage __visible int restore_ * kernel's text (this value is passed in the image header). */ unsigned long restore_jump_address __visible; +unsigned long
Re: [PATCH v3] x86/power/64: Fix kernel text mapping corruption during image restoration (was: Re: ktime_get_ts64() splat during resume)
On Mon, Jun 27, 2016 at 04:24:22PM +0200, Rafael J. Wysocki wrote: > From: Rafael J. Wysocki> Subject: [PATCH v2] x86/power/64: Fix kernel text mapping corruption during > image restoration > > Logan Gunthorpe reports that hibernation stopped working reliably for > him after commit ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table > and rodata). > > That turns out to be a consequence of a long-standing issue with the > 64-bit image restoration code on x86, which is that the temporary > page tables set up by it to avoid page tables corruption when the > last bits of the image kernel's memory contents are copied into > their original page frames re-use the boot kernel's text mapping, > but that mapping may very well get corrupted just like any other > part of the page tables. Of course, if that happens, the final > jump to the image kernel's entry point will go to nowhere. > > The exact reason why commit ab76f7b4ab23 matters here is that it > sometimes causes a PMD of a large page to be split into PTEs > that are allocated dynamically and get corrupted during image > restoration as described above. > > To fix that issue note that the code copying the last bits of the > image kernel's memory contents to the page frames occupied by them > previoulsy doesn't use the kernel text mapping, because it runs from > a special page covered by the identity mapping set up for that code > from scratch. Hence, the kernel text mapping is only needed before > that code starts to run and then it will only be used just for the > final jump to the image kernel's entry point. > > Accordingly, the temporary page tables set up in swsusp_arch_resume() > on x86-64 need to contain the kernel text mapping too. That mapping > is only going to be used for the final jump to the image kernel, so > it only needs to cover the image kernel's entry point, because the > first thing the image kernel does after getting control back is to > switch over to its own original page tables. Moreover, the virtual > address of the image kernel's entry point in that mapping has to be > the same as the one mapped by the image kernel's page tables. > > With that in mind, modify the x86-64's arch_hibernation_header_save() > and arch_hibernation_header_restore() routines to pass the physical > address of the image kernel's entry point (in addition to its virtual > address) to the boot kernel (a small piece of assembly code involved > in passing the entry point's virtual address to the image kernel is > not necessary any more after that, so drop it). Update RESTORE_MAGIC > too to reflect the image header format change. > > Next, in set_up_temporary_mappings(), use the physical and virtual > addresses of the image kernel's entry point passed in the image > header to set up a minimum kernel text mapping (using memory pages > that won't be overwritten by the image kernel's memory contents) that > will map those addresses to each other as appropriate. > > This makes the concern about the possible corruption of the original > boot kernel text mapping go away and if the the minimum kernel text > mapping used for the final jump marks the image kernel's entry point > memory as executable, the jump to it is guaraneed to succeed. > > Fixes: ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table and rodata) > Link: http://marc.info/?l=linux-pm=146372852823760=2 > Reported-by: Logan Gunthorpe > Signed-off-by: Rafael J. Wysocki > --- > arch/x86/power/hibernate_64.c | 90 > -- > arch/x86/power/hibernate_asm_64.S | 55 ++- > 2 files changed, 102 insertions(+), 43 deletions(-) Reported-and-tested-by: Borislav Petkov -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply.
Re: [PATCH v3] x86/power/64: Fix kernel text mapping corruption during image restoration (was: Re: ktime_get_ts64() splat during resume)
On Mon, Jun 27, 2016 at 04:24:22PM +0200, Rafael J. Wysocki wrote: > From: Rafael J. Wysocki > Subject: [PATCH v2] x86/power/64: Fix kernel text mapping corruption during > image restoration > > Logan Gunthorpe reports that hibernation stopped working reliably for > him after commit ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table > and rodata). > > That turns out to be a consequence of a long-standing issue with the > 64-bit image restoration code on x86, which is that the temporary > page tables set up by it to avoid page tables corruption when the > last bits of the image kernel's memory contents are copied into > their original page frames re-use the boot kernel's text mapping, > but that mapping may very well get corrupted just like any other > part of the page tables. Of course, if that happens, the final > jump to the image kernel's entry point will go to nowhere. > > The exact reason why commit ab76f7b4ab23 matters here is that it > sometimes causes a PMD of a large page to be split into PTEs > that are allocated dynamically and get corrupted during image > restoration as described above. > > To fix that issue note that the code copying the last bits of the > image kernel's memory contents to the page frames occupied by them > previoulsy doesn't use the kernel text mapping, because it runs from > a special page covered by the identity mapping set up for that code > from scratch. Hence, the kernel text mapping is only needed before > that code starts to run and then it will only be used just for the > final jump to the image kernel's entry point. > > Accordingly, the temporary page tables set up in swsusp_arch_resume() > on x86-64 need to contain the kernel text mapping too. That mapping > is only going to be used for the final jump to the image kernel, so > it only needs to cover the image kernel's entry point, because the > first thing the image kernel does after getting control back is to > switch over to its own original page tables. Moreover, the virtual > address of the image kernel's entry point in that mapping has to be > the same as the one mapped by the image kernel's page tables. > > With that in mind, modify the x86-64's arch_hibernation_header_save() > and arch_hibernation_header_restore() routines to pass the physical > address of the image kernel's entry point (in addition to its virtual > address) to the boot kernel (a small piece of assembly code involved > in passing the entry point's virtual address to the image kernel is > not necessary any more after that, so drop it). Update RESTORE_MAGIC > too to reflect the image header format change. > > Next, in set_up_temporary_mappings(), use the physical and virtual > addresses of the image kernel's entry point passed in the image > header to set up a minimum kernel text mapping (using memory pages > that won't be overwritten by the image kernel's memory contents) that > will map those addresses to each other as appropriate. > > This makes the concern about the possible corruption of the original > boot kernel text mapping go away and if the the minimum kernel text > mapping used for the final jump marks the image kernel's entry point > memory as executable, the jump to it is guaraneed to succeed. > > Fixes: ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table and rodata) > Link: http://marc.info/?l=linux-pm=146372852823760=2 > Reported-by: Logan Gunthorpe > Signed-off-by: Rafael J. Wysocki > --- > arch/x86/power/hibernate_64.c | 90 > -- > arch/x86/power/hibernate_asm_64.S | 55 ++- > 2 files changed, 102 insertions(+), 43 deletions(-) Reported-and-tested-by: Borislav Petkov -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply.
[PATCH v3] x86/power/64: Fix kernel text mapping corruption during image restoration (was: Re: ktime_get_ts64() splat during resume)
On Tuesday, June 21, 2016 11:04:41 AM Kees Cook wrote: > On Mon, Jun 20, 2016 at 9:35 PM, Logan Gunthorpewrote: > > Hey Rafael, > > > > This patch appears to be working on my laptop. Thanks. > > Same for me: resume still works with KASLR in my tests too. Unfortunately, Boris still sees post-resume memory corruption with the patch you have tested, but that turns out to be a result of some super-weird corruption of a pointer on the stack which leads to a store at a wrong address (and there's no way it can be related to the rest of the patch). We have verified that it can be avoided by rearranging set_up_temporary_text_mapping() to use fewer local variables and the appended patch contains this modification. I also went on and changed relocate_restore_code() slightly in a similar fashion, but all of those changes should not affect the behavior (unless there's something insane going on behind the curtains, that is). Kees, Logan, Boris, please try this one and let me know if it works for you. Thanks, Rafael --- From: Rafael J. Wysocki Subject: [PATCH v2] x86/power/64: Fix kernel text mapping corruption during image restoration Logan Gunthorpe reports that hibernation stopped working reliably for him after commit ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table and rodata). That turns out to be a consequence of a long-standing issue with the 64-bit image restoration code on x86, which is that the temporary page tables set up by it to avoid page tables corruption when the last bits of the image kernel's memory contents are copied into their original page frames re-use the boot kernel's text mapping, but that mapping may very well get corrupted just like any other part of the page tables. Of course, if that happens, the final jump to the image kernel's entry point will go to nowhere. The exact reason why commit ab76f7b4ab23 matters here is that it sometimes causes a PMD of a large page to be split into PTEs that are allocated dynamically and get corrupted during image restoration as described above. To fix that issue note that the code copying the last bits of the image kernel's memory contents to the page frames occupied by them previoulsy doesn't use the kernel text mapping, because it runs from a special page covered by the identity mapping set up for that code from scratch. Hence, the kernel text mapping is only needed before that code starts to run and then it will only be used just for the final jump to the image kernel's entry point. Accordingly, the temporary page tables set up in swsusp_arch_resume() on x86-64 need to contain the kernel text mapping too. That mapping is only going to be used for the final jump to the image kernel, so it only needs to cover the image kernel's entry point, because the first thing the image kernel does after getting control back is to switch over to its own original page tables. Moreover, the virtual address of the image kernel's entry point in that mapping has to be the same as the one mapped by the image kernel's page tables. With that in mind, modify the x86-64's arch_hibernation_header_save() and arch_hibernation_header_restore() routines to pass the physical address of the image kernel's entry point (in addition to its virtual address) to the boot kernel (a small piece of assembly code involved in passing the entry point's virtual address to the image kernel is not necessary any more after that, so drop it). Update RESTORE_MAGIC too to reflect the image header format change. Next, in set_up_temporary_mappings(), use the physical and virtual addresses of the image kernel's entry point passed in the image header to set up a minimum kernel text mapping (using memory pages that won't be overwritten by the image kernel's memory contents) that will map those addresses to each other as appropriate. This makes the concern about the possible corruption of the original boot kernel text mapping go away and if the the minimum kernel text mapping used for the final jump marks the image kernel's entry point memory as executable, the jump to it is guaraneed to succeed. Fixes: ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table and rodata) Link: http://marc.info/?l=linux-pm=146372852823760=2 Reported-by: Logan Gunthorpe Signed-off-by: Rafael J. Wysocki --- arch/x86/power/hibernate_64.c | 90 -- arch/x86/power/hibernate_asm_64.S | 55 ++- 2 files changed, 102 insertions(+), 43 deletions(-) Index: linux-pm/arch/x86/power/hibernate_64.c === --- linux-pm.orig/arch/x86/power/hibernate_64.c +++ linux-pm/arch/x86/power/hibernate_64.c @@ -28,6 +28,7 @@ extern asmlinkage __visible int restore_ * kernel's text (this value is passed in the image header). */ unsigned long restore_jump_address __visible; +unsigned long
[PATCH v3] x86/power/64: Fix kernel text mapping corruption during image restoration (was: Re: ktime_get_ts64() splat during resume)
On Tuesday, June 21, 2016 11:04:41 AM Kees Cook wrote: > On Mon, Jun 20, 2016 at 9:35 PM, Logan Gunthorpe wrote: > > Hey Rafael, > > > > This patch appears to be working on my laptop. Thanks. > > Same for me: resume still works with KASLR in my tests too. Unfortunately, Boris still sees post-resume memory corruption with the patch you have tested, but that turns out to be a result of some super-weird corruption of a pointer on the stack which leads to a store at a wrong address (and there's no way it can be related to the rest of the patch). We have verified that it can be avoided by rearranging set_up_temporary_text_mapping() to use fewer local variables and the appended patch contains this modification. I also went on and changed relocate_restore_code() slightly in a similar fashion, but all of those changes should not affect the behavior (unless there's something insane going on behind the curtains, that is). Kees, Logan, Boris, please try this one and let me know if it works for you. Thanks, Rafael --- From: Rafael J. Wysocki Subject: [PATCH v2] x86/power/64: Fix kernel text mapping corruption during image restoration Logan Gunthorpe reports that hibernation stopped working reliably for him after commit ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table and rodata). That turns out to be a consequence of a long-standing issue with the 64-bit image restoration code on x86, which is that the temporary page tables set up by it to avoid page tables corruption when the last bits of the image kernel's memory contents are copied into their original page frames re-use the boot kernel's text mapping, but that mapping may very well get corrupted just like any other part of the page tables. Of course, if that happens, the final jump to the image kernel's entry point will go to nowhere. The exact reason why commit ab76f7b4ab23 matters here is that it sometimes causes a PMD of a large page to be split into PTEs that are allocated dynamically and get corrupted during image restoration as described above. To fix that issue note that the code copying the last bits of the image kernel's memory contents to the page frames occupied by them previoulsy doesn't use the kernel text mapping, because it runs from a special page covered by the identity mapping set up for that code from scratch. Hence, the kernel text mapping is only needed before that code starts to run and then it will only be used just for the final jump to the image kernel's entry point. Accordingly, the temporary page tables set up in swsusp_arch_resume() on x86-64 need to contain the kernel text mapping too. That mapping is only going to be used for the final jump to the image kernel, so it only needs to cover the image kernel's entry point, because the first thing the image kernel does after getting control back is to switch over to its own original page tables. Moreover, the virtual address of the image kernel's entry point in that mapping has to be the same as the one mapped by the image kernel's page tables. With that in mind, modify the x86-64's arch_hibernation_header_save() and arch_hibernation_header_restore() routines to pass the physical address of the image kernel's entry point (in addition to its virtual address) to the boot kernel (a small piece of assembly code involved in passing the entry point's virtual address to the image kernel is not necessary any more after that, so drop it). Update RESTORE_MAGIC too to reflect the image header format change. Next, in set_up_temporary_mappings(), use the physical and virtual addresses of the image kernel's entry point passed in the image header to set up a minimum kernel text mapping (using memory pages that won't be overwritten by the image kernel's memory contents) that will map those addresses to each other as appropriate. This makes the concern about the possible corruption of the original boot kernel text mapping go away and if the the minimum kernel text mapping used for the final jump marks the image kernel's entry point memory as executable, the jump to it is guaraneed to succeed. Fixes: ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table and rodata) Link: http://marc.info/?l=linux-pm=146372852823760=2 Reported-by: Logan Gunthorpe Signed-off-by: Rafael J. Wysocki --- arch/x86/power/hibernate_64.c | 90 -- arch/x86/power/hibernate_asm_64.S | 55 ++- 2 files changed, 102 insertions(+), 43 deletions(-) Index: linux-pm/arch/x86/power/hibernate_64.c === --- linux-pm.orig/arch/x86/power/hibernate_64.c +++ linux-pm/arch/x86/power/hibernate_64.c @@ -28,6 +28,7 @@ extern asmlinkage __visible int restore_ * kernel's text (this value is passed in the image header). */ unsigned long restore_jump_address __visible; +unsigned long jump_address_phys; /* * Value of the cr3 register from before the hibernation (this value is passed @@