Re: [PATCH v3] x86/power/64: Fix kernel text mapping corruption during image restoration

2016-06-30 Thread Rafael J. Wysocki
On Thu, Jun 30, 2016 at 5:56 AM, Logan Gunthorpe  wrote:
>
>
> On 29/06/16 08:55 PM, Rafael J. Wysocki wrote:
>
>> The only thing that comes to mind at this point is that TLBs should be
>> flushed
>> after page tables changes, so please apply the appended and let me know
>> if you see this panic any more with it.
>
>
>
> Ok, I'll build a new kernel tomorrow.

Well, please wait for a while.

I'm looking at the panic log right now and the picture is a bit clearer now.

> But keep in mind the panic is pretty
> rare as I've only seen it once so far after a couple dozen or so hibernates.
> So it may be hard to get a concrete yes or no on whether it fixes the issue.

Right.

> I've got a script to run a bunch of hibernates in a row. I usually only run
> it for a handful of iterations, but I'll try running it for much longer with
> this patch and let you know in a couple days.

As I said, please wait a bit, there may be updates in the meantime. :-)

>From looking at your panic log, the exception happened in
swsusp_arch_resume(), which probably covers restore_image() too,
because that is likely to go into swsusp_arch_resume() in line.

Next, the address in RIP (a) clearly is a page start and (b) is
relative to the identity mapping, so it most likely is the address
from relocated_restore_code.  Moreover, the RCX value is the same
address and the values in the other registers also match exactly the
situation before the jump to relocated_restore_code.  Thus the
exception was triggered by that jump beyond doubt.

Now, if you look above the Oops line, it becomes quite clear what
happened.  Namely, dump_pagetable() (arch/x86/mm/fault.c, line 524 in
4.6) prints the PGD, the PUD, the PMD and the PTE in that order,
unless the lower levels (PTE, PMD) are not present.  In your panic
log, only the PGD and PUD are present, meaning that this is a 1G page
and sure enough it has the NX bit set.

This case was clearly overlooked by relocate_restore_code(), so
updated patch will follow.

Thanks,
Rafael


Re: [PATCH v3] x86/power/64: Fix kernel text mapping corruption during image restoration

2016-06-30 Thread Rafael J. Wysocki
On Thu, Jun 30, 2016 at 5:56 AM, Logan Gunthorpe  wrote:
>
>
> On 29/06/16 08:55 PM, Rafael J. Wysocki wrote:
>
>> The only thing that comes to mind at this point is that TLBs should be
>> flushed
>> after page tables changes, so please apply the appended and let me know
>> if you see this panic any more with it.
>
>
>
> Ok, I'll build a new kernel tomorrow.

Well, please wait for a while.

I'm looking at the panic log right now and the picture is a bit clearer now.

> But keep in mind the panic is pretty
> rare as I've only seen it once so far after a couple dozen or so hibernates.
> So it may be hard to get a concrete yes or no on whether it fixes the issue.

Right.

> I've got a script to run a bunch of hibernates in a row. I usually only run
> it for a handful of iterations, but I'll try running it for much longer with
> this patch and let you know in a couple days.

As I said, please wait a bit, there may be updates in the meantime. :-)

>From looking at your panic log, the exception happened in
swsusp_arch_resume(), which probably covers restore_image() too,
because that is likely to go into swsusp_arch_resume() in line.

Next, the address in RIP (a) clearly is a page start and (b) is
relative to the identity mapping, so it most likely is the address
from relocated_restore_code.  Moreover, the RCX value is the same
address and the values in the other registers also match exactly the
situation before the jump to relocated_restore_code.  Thus the
exception was triggered by that jump beyond doubt.

Now, if you look above the Oops line, it becomes quite clear what
happened.  Namely, dump_pagetable() (arch/x86/mm/fault.c, line 524 in
4.6) prints the PGD, the PUD, the PMD and the PTE in that order,
unless the lower levels (PTE, PMD) are not present.  In your panic
log, only the PGD and PUD are present, meaning that this is a 1G page
and sure enough it has the NX bit set.

This case was clearly overlooked by relocate_restore_code(), so
updated patch will follow.

Thanks,
Rafael


Re: [PATCH v3] x86/power/64: Fix kernel text mapping corruption during image restoration

2016-06-30 Thread Rafael J. Wysocki
On Thu, Jun 30, 2016 at 11:45 AM, Borislav Petkov  wrote:
> On Thu, Jun 30, 2016 at 04:20:43AM +0200, Rafael J. Wysocki wrote:
>> That's not what Boris was seeing at least.
>
> Well, I had it a couple of times during testing patches. This is all
> from the logs:
>
> [   65.121109] PM: Basic memory bitmaps freed
> [   65.125991] Restarting tasks ...
> [   65.129342] kernel tried to execute NX-protected page - exploit attempt? 
> (uid: 0)
> [   65.129585] done.
> [   65.141314] BUG: unable to handle kernel paging request at 88042b957e40

I mean the failure mode, not the particular exception type. :-)

You always saw it in a user space task after kernel resume:

> [   65.141340] Call Trace:
> [   65.141344]  [] ? getname_flags+0x5e/0x1b0
> [   65.141346]  [] ? cp_new_stat+0x10f/0x120
> [   65.141348]  [] ? ktime_get_ts64+0x4a/0xf0
> [   65.141353]  [] ? poll_select_copy_remaining+0xe7/0x130
> [   65.141355]  [] exit_to_usermode_loop+0x8a/0xb0
> [   65.141356]  [] syscall_return_slowpath+0x5b/0x70
> [   65.141358]  [] entry_SYSCALL_64_fastpath+0xa5/0xa7

[cut]

> [  381.850792] Call Trace:
> [  381.850795]  [] ? getname_flags+0x5e/0x1b0
> [  381.850797]  [] ? cp_new_stat+0x10f/0x120
> [  381.850799]  [] ? ktime_get_ts64+0x4a/0xf0
> [  381.850800]  [] ? poll_select_copy_remaining+0xe7/0x130
> [  381.850802]  [] exit_to_usermode_loop+0x8a/0xb0
> [  381.850804]  [] syscall_return_slowpath+0x5b/0x70
> [  381.850806]  [] entry_SYSCALL_64_fastpath+0xa5/0xa7

[cut]

> [   49.022675] Call Trace:
> [   49.022680]  [] ? getname_flags+0x5e/0x1b0
> [   49.022683]  [] ? cp_new_stat+0x10f/0x120
> [   49.022686]  [] ? ktime_get_ts64+0x4a/0xf0
> [   49.022689]  [] ? poll_select_copy_remaining+0xe7/0x130
> [   49.022692]  [] exit_to_usermode_loop+0x8a/0xb0
> [   49.022695]  [] syscall_return_slowpath+0x5b/0x70
> [   49.022698]  [] entry_SYSCALL_64_fastpath+0xa5/0xa7

[cut]

> [   39.636905] Call Trace:
> [   39.636908]  [] ? getname_flags+0x5e/0x1b0
> [   39.636910]  [] ? cp_new_stat+0x10f/0x120
> [   39.636912]  [] ? ktime_get_ts64+0x4a/0xf0
> [   39.636917]  [] ? poll_select_copy_remaining+0xe7/0x130
> [   39.636919]  [] exit_to_usermode_loop+0x8a/0xb0
> [   39.636921]  [] syscall_return_slowpath+0x5b/0x70
> [   39.636922]  [] entry_SYSCALL_64_fastpath+0xa5/0xa7

which is a clear indication of image corruption during restore.

In the Logan's case this happens in swsusp_arch_resume() proper and
the address in RIP is relative to the identity mapping, so the only
place it can happen is the jump to relocated_restore_code.  That's
because before that jump the addresses in RIP are relative to the
kernel text mapping and after it we immediately switch over to the
temporary page tables which are all executable.  So that is the only
place AFAICS.

Also in your case the failure was 100% reproducible, while in the
Logan's case it has happened once so far (so generally it happens once
in a blue moon).

In summary, I'm sure that this is a different issue.

Thanks,
Rafael


Re: [PATCH v3] x86/power/64: Fix kernel text mapping corruption during image restoration

2016-06-30 Thread Rafael J. Wysocki
On Thu, Jun 30, 2016 at 11:45 AM, Borislav Petkov  wrote:
> On Thu, Jun 30, 2016 at 04:20:43AM +0200, Rafael J. Wysocki wrote:
>> That's not what Boris was seeing at least.
>
> Well, I had it a couple of times during testing patches. This is all
> from the logs:
>
> [   65.121109] PM: Basic memory bitmaps freed
> [   65.125991] Restarting tasks ...
> [   65.129342] kernel tried to execute NX-protected page - exploit attempt? 
> (uid: 0)
> [   65.129585] done.
> [   65.141314] BUG: unable to handle kernel paging request at 88042b957e40

I mean the failure mode, not the particular exception type. :-)

You always saw it in a user space task after kernel resume:

> [   65.141340] Call Trace:
> [   65.141344]  [] ? getname_flags+0x5e/0x1b0
> [   65.141346]  [] ? cp_new_stat+0x10f/0x120
> [   65.141348]  [] ? ktime_get_ts64+0x4a/0xf0
> [   65.141353]  [] ? poll_select_copy_remaining+0xe7/0x130
> [   65.141355]  [] exit_to_usermode_loop+0x8a/0xb0
> [   65.141356]  [] syscall_return_slowpath+0x5b/0x70
> [   65.141358]  [] entry_SYSCALL_64_fastpath+0xa5/0xa7

[cut]

> [  381.850792] Call Trace:
> [  381.850795]  [] ? getname_flags+0x5e/0x1b0
> [  381.850797]  [] ? cp_new_stat+0x10f/0x120
> [  381.850799]  [] ? ktime_get_ts64+0x4a/0xf0
> [  381.850800]  [] ? poll_select_copy_remaining+0xe7/0x130
> [  381.850802]  [] exit_to_usermode_loop+0x8a/0xb0
> [  381.850804]  [] syscall_return_slowpath+0x5b/0x70
> [  381.850806]  [] entry_SYSCALL_64_fastpath+0xa5/0xa7

[cut]

> [   49.022675] Call Trace:
> [   49.022680]  [] ? getname_flags+0x5e/0x1b0
> [   49.022683]  [] ? cp_new_stat+0x10f/0x120
> [   49.022686]  [] ? ktime_get_ts64+0x4a/0xf0
> [   49.022689]  [] ? poll_select_copy_remaining+0xe7/0x130
> [   49.022692]  [] exit_to_usermode_loop+0x8a/0xb0
> [   49.022695]  [] syscall_return_slowpath+0x5b/0x70
> [   49.022698]  [] entry_SYSCALL_64_fastpath+0xa5/0xa7

[cut]

> [   39.636905] Call Trace:
> [   39.636908]  [] ? getname_flags+0x5e/0x1b0
> [   39.636910]  [] ? cp_new_stat+0x10f/0x120
> [   39.636912]  [] ? ktime_get_ts64+0x4a/0xf0
> [   39.636917]  [] ? poll_select_copy_remaining+0xe7/0x130
> [   39.636919]  [] exit_to_usermode_loop+0x8a/0xb0
> [   39.636921]  [] syscall_return_slowpath+0x5b/0x70
> [   39.636922]  [] entry_SYSCALL_64_fastpath+0xa5/0xa7

which is a clear indication of image corruption during restore.

In the Logan's case this happens in swsusp_arch_resume() proper and
the address in RIP is relative to the identity mapping, so the only
place it can happen is the jump to relocated_restore_code.  That's
because before that jump the addresses in RIP are relative to the
kernel text mapping and after it we immediately switch over to the
temporary page tables which are all executable.  So that is the only
place AFAICS.

Also in your case the failure was 100% reproducible, while in the
Logan's case it has happened once so far (so generally it happens once
in a blue moon).

In summary, I'm sure that this is a different issue.

Thanks,
Rafael


Re: [PATCH v3] x86/power/64: Fix kernel text mapping corruption during image restoration

2016-06-30 Thread Borislav Petkov
On Thu, Jun 30, 2016 at 04:20:43AM +0200, Rafael J. Wysocki wrote:
> That's not what Boris was seeing at least.

Well, I had it a couple of times during testing patches. This is all
from the logs:

[   65.121109] PM: Basic memory bitmaps freed
[   65.125991] Restarting tasks ... 
[   65.129342] kernel tried to execute NX-protected page - exploit attempt? 
(uid: 0)
[   65.129585] done.
[   65.141314] BUG: unable to handle kernel paging request at 88042b957e40
[   65.141316] IP: [] 0x88042b957e40
[   65.141318] PGD 2067067 PUD 206a067 PMD 80042b8001e3 
[   65.141319] Oops: 0011 [#1] PREEMPT SMP
[   65.141327] Modules linked in: binfmt_misc ipv6 vfat fat amd64_edac_mod 
edac_mce_amd fuse dm_crypt dm_mod amdkfd kvm_amd kvm amd_iommu_v2 irqbypass 
crc32_pclmul radeon aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper 
cryptd fam15h_power k10temp acpi_cpufreq
[   65.141328] CPU: 6 PID: 1 Comm: init Not tainted 4.7.0-rc3+ #4
[   65.141329] Hardware name: To be filled by O.E.M. To be filled by 
O.E.M./M5A97 EVO R2.0, BIOS 1503 01/16/2013
[   65.141329] task: 88042b958000 ti: 88042b954000 task.ti: 
88042b954000
[   65.141331] RIP: 0010:[]  [] 
0x88042b957e40
[   65.141331] RSP: 0018:88042b957e00  EFLAGS: 00010282
[   65.141332] RAX:  RBX: 88042b957f58 RCX: 
[   65.141333] RDX: 0001 RSI: 81063b59 RDI: 8168898c
[   65.141333] RBP: 88042b957ef0 R08:  R09: 0002
[   65.141334] R10:  R11: 0001 R12: 88042b954000
[   65.141334] R13: 88042b954000 R14: 88042b957f58 R15: 88042b958000
[   65.141335] FS:  7fad32173800() GS:88043dd8() 
knlGS:
[   65.141336] CS:  0010 DS:  ES:  CR0: 80050033
[   65.141336] CR2: 88042b957e40 CR3: 0004298e6000 CR4: 000406e0
[   65.141336] Stack:
[   65.141338]  880037b81000 880037b81000  
81181e1e
[   65.141339]  ff9c0002 880429e8c600 811782bf 
0011
[   65.141340]  049c 0001 1180 

[   65.141340] Call Trace:
[   65.141344]  [] ? getname_flags+0x5e/0x1b0
[   65.141346]  [] ? cp_new_stat+0x10f/0x120
[   65.141348]  [] ? ktime_get_ts64+0x4a/0xf0
[   65.141353]  [] ? poll_select_copy_remaining+0xe7/0x130
[   65.141355]  [] exit_to_usermode_loop+0x8a/0xb0
[   65.141356]  [] syscall_return_slowpath+0x5b/0x70
[   65.141358]  [] entry_SYSCALL_64_fastpath+0xa5/0xa7
[   65.141374] Code: 00 00 00 1e 1e 18 81 ff ff ff ff 02 00 00 00 9c ff ff ff 
00 c6 e8 29 04 88 ff ff bf 82 17 81 ff ff ff ff 11 00 00 00 00 00 00 00 <9c> 04 
00 00 00 00 00 00 01 00 00 00 00 00 00 00 80 11 00 00 00 
[   65.141375] RIP  [] 0x88042b957e40
[   65.141376]  RSP 
[   65.141376] CR2: 88042b957e40
[   65.141378] ---[ end trace 5dc71ecf8d888ee6 ]---
[   65.141509] Kernel panic - not syncing: Attempted to kill init! 
exitcode=0x0009
[   65.141509] 
[   65.149191] Kernel Offset: disabled
[   65.449314] ---[ end Kernel panic - not syncing: Attempted to kill init! 
exitcode=0x0009

...

[  381.835297] Restarting tasks ... 
[  381.838620] kernel tried to execute NX-protected page - exploit attempt? 
(uid: 0)
[  381.838689] done.
[  381.850763] BUG: unable to handle kernel paging request at 88042b957e40
[  381.850765] IP: [] 0x88042b957e40
[  381.850766] PGD 2065067 PUD 2068067 PMD 80042b8001e3 
[  381.850767] Oops: 0011 [#1] PREEMPT SMP
[  381.850778] Modules linked in: binfmt_misc ipv6 vfat fat amd64_edac_mod 
edac_mce_amd fuse dm_crypt dm_mod amdkfd kvm_amd kvm amd_iommu_v2 radeon 
irqbypass crc32_pclmul aesni_intel aes_x86_64 lrw gf128mul glue_helper 
ablk_helper cryptd k10temp fam15h_power acpi_cpufreq
[  381.850779] CPU: 3 PID: 1 Comm: init Not tainted 4.7.0-rc3+ #1
[  381.850780] Hardware name: To be filled by O.E.M. To be filled by 
O.E.M./M5A97 EVO R2.0, BIOS 1503 01/16/2013
[  381.850781] task: 88042b958000 ti: 88042b954000 task.ti: 
88042b954000
[  381.850782] RIP: 0010:[]  [] 
0x88042b957e40
[  381.850783] RSP: 0018:88042b957e00  EFLAGS: 00010282
[  381.850783] RAX:  RBX: 88042b957f58 RCX: 
[  381.850784] RDX: 0001 RSI: 81062a2d RDI: 81687d8c
[  381.850784] RBP: 88042b957ef0 R08:  R09: 0002
[  381.850785] R10:  R11: 0001 R12: 88042b954000
[  381.850785] R13: 88042b954000 R14: 88042b957f58 R15: 88042b958000
[  381.850786] FS:  7f1143649800() GS:88043dcc() 
knlGS:
[  381.850787] CS:  0010 DS:  ES:  CR0: 80050033
[  381.850787] CR2: 88042b957e40 CR3: 0004298af000 CR4: 000406e0
[  381.850788] Stack:
[  381.850789]  88042b1ed000 88042b1ed000  
8117f8ae
[  381.850790]  ff9c0002 

Re: [PATCH v3] x86/power/64: Fix kernel text mapping corruption during image restoration

2016-06-30 Thread Borislav Petkov
On Thu, Jun 30, 2016 at 04:20:43AM +0200, Rafael J. Wysocki wrote:
> That's not what Boris was seeing at least.

Well, I had it a couple of times during testing patches. This is all
from the logs:

[   65.121109] PM: Basic memory bitmaps freed
[   65.125991] Restarting tasks ... 
[   65.129342] kernel tried to execute NX-protected page - exploit attempt? 
(uid: 0)
[   65.129585] done.
[   65.141314] BUG: unable to handle kernel paging request at 88042b957e40
[   65.141316] IP: [] 0x88042b957e40
[   65.141318] PGD 2067067 PUD 206a067 PMD 80042b8001e3 
[   65.141319] Oops: 0011 [#1] PREEMPT SMP
[   65.141327] Modules linked in: binfmt_misc ipv6 vfat fat amd64_edac_mod 
edac_mce_amd fuse dm_crypt dm_mod amdkfd kvm_amd kvm amd_iommu_v2 irqbypass 
crc32_pclmul radeon aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper 
cryptd fam15h_power k10temp acpi_cpufreq
[   65.141328] CPU: 6 PID: 1 Comm: init Not tainted 4.7.0-rc3+ #4
[   65.141329] Hardware name: To be filled by O.E.M. To be filled by 
O.E.M./M5A97 EVO R2.0, BIOS 1503 01/16/2013
[   65.141329] task: 88042b958000 ti: 88042b954000 task.ti: 
88042b954000
[   65.141331] RIP: 0010:[]  [] 
0x88042b957e40
[   65.141331] RSP: 0018:88042b957e00  EFLAGS: 00010282
[   65.141332] RAX:  RBX: 88042b957f58 RCX: 
[   65.141333] RDX: 0001 RSI: 81063b59 RDI: 8168898c
[   65.141333] RBP: 88042b957ef0 R08:  R09: 0002
[   65.141334] R10:  R11: 0001 R12: 88042b954000
[   65.141334] R13: 88042b954000 R14: 88042b957f58 R15: 88042b958000
[   65.141335] FS:  7fad32173800() GS:88043dd8() 
knlGS:
[   65.141336] CS:  0010 DS:  ES:  CR0: 80050033
[   65.141336] CR2: 88042b957e40 CR3: 0004298e6000 CR4: 000406e0
[   65.141336] Stack:
[   65.141338]  880037b81000 880037b81000  
81181e1e
[   65.141339]  ff9c0002 880429e8c600 811782bf 
0011
[   65.141340]  049c 0001 1180 

[   65.141340] Call Trace:
[   65.141344]  [] ? getname_flags+0x5e/0x1b0
[   65.141346]  [] ? cp_new_stat+0x10f/0x120
[   65.141348]  [] ? ktime_get_ts64+0x4a/0xf0
[   65.141353]  [] ? poll_select_copy_remaining+0xe7/0x130
[   65.141355]  [] exit_to_usermode_loop+0x8a/0xb0
[   65.141356]  [] syscall_return_slowpath+0x5b/0x70
[   65.141358]  [] entry_SYSCALL_64_fastpath+0xa5/0xa7
[   65.141374] Code: 00 00 00 1e 1e 18 81 ff ff ff ff 02 00 00 00 9c ff ff ff 
00 c6 e8 29 04 88 ff ff bf 82 17 81 ff ff ff ff 11 00 00 00 00 00 00 00 <9c> 04 
00 00 00 00 00 00 01 00 00 00 00 00 00 00 80 11 00 00 00 
[   65.141375] RIP  [] 0x88042b957e40
[   65.141376]  RSP 
[   65.141376] CR2: 88042b957e40
[   65.141378] ---[ end trace 5dc71ecf8d888ee6 ]---
[   65.141509] Kernel panic - not syncing: Attempted to kill init! 
exitcode=0x0009
[   65.141509] 
[   65.149191] Kernel Offset: disabled
[   65.449314] ---[ end Kernel panic - not syncing: Attempted to kill init! 
exitcode=0x0009

...

[  381.835297] Restarting tasks ... 
[  381.838620] kernel tried to execute NX-protected page - exploit attempt? 
(uid: 0)
[  381.838689] done.
[  381.850763] BUG: unable to handle kernel paging request at 88042b957e40
[  381.850765] IP: [] 0x88042b957e40
[  381.850766] PGD 2065067 PUD 2068067 PMD 80042b8001e3 
[  381.850767] Oops: 0011 [#1] PREEMPT SMP
[  381.850778] Modules linked in: binfmt_misc ipv6 vfat fat amd64_edac_mod 
edac_mce_amd fuse dm_crypt dm_mod amdkfd kvm_amd kvm amd_iommu_v2 radeon 
irqbypass crc32_pclmul aesni_intel aes_x86_64 lrw gf128mul glue_helper 
ablk_helper cryptd k10temp fam15h_power acpi_cpufreq
[  381.850779] CPU: 3 PID: 1 Comm: init Not tainted 4.7.0-rc3+ #1
[  381.850780] Hardware name: To be filled by O.E.M. To be filled by 
O.E.M./M5A97 EVO R2.0, BIOS 1503 01/16/2013
[  381.850781] task: 88042b958000 ti: 88042b954000 task.ti: 
88042b954000
[  381.850782] RIP: 0010:[]  [] 
0x88042b957e40
[  381.850783] RSP: 0018:88042b957e00  EFLAGS: 00010282
[  381.850783] RAX:  RBX: 88042b957f58 RCX: 
[  381.850784] RDX: 0001 RSI: 81062a2d RDI: 81687d8c
[  381.850784] RBP: 88042b957ef0 R08:  R09: 0002
[  381.850785] R10:  R11: 0001 R12: 88042b954000
[  381.850785] R13: 88042b954000 R14: 88042b957f58 R15: 88042b958000
[  381.850786] FS:  7f1143649800() GS:88043dcc() 
knlGS:
[  381.850787] CS:  0010 DS:  ES:  CR0: 80050033
[  381.850787] CR2: 88042b957e40 CR3: 0004298af000 CR4: 000406e0
[  381.850788] Stack:
[  381.850789]  88042b1ed000 88042b1ed000  
8117f8ae
[  381.850790]  ff9c0002 

Re: [PATCH v3] x86/power/64: Fix kernel text mapping corruption during image restoration

2016-06-29 Thread Logan Gunthorpe



On 29/06/16 08:55 PM, Rafael J. Wysocki wrote:


The only thing that comes to mind at this point is that TLBs should be flushed
after page tables changes, so please apply the appended and let me know
if you see this panic any more with it.



Ok, I'll build a new kernel tomorrow. But keep in mind the panic is 
pretty rare as I've only seen it once so far after a couple dozen or so 
hibernates. So it may be hard to get a concrete yes or no on whether it 
fixes the issue.


I've got a script to run a bunch of hibernates in a row. I usually only 
run it for a handful of iterations, but I'll try running it for much 
longer with this patch and let you know in a couple days.


Logan



Thanks,
Rafael


---
 arch/x86/power/hibernate_64.c |   92 +-
 arch/x86/power/hibernate_asm_64.S |   55 +-
 2 files changed, 104 insertions(+), 43 deletions(-)

Index: linux-pm/arch/x86/power/hibernate_64.c
===
--- linux-pm.orig/arch/x86/power/hibernate_64.c
+++ linux-pm/arch/x86/power/hibernate_64.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 

 /* Defined in hibernate_asm_64.S */
 extern asmlinkage __visible int restore_image(void);
@@ -28,6 +29,7 @@ extern asmlinkage __visible int restore_
  * kernel's text (this value is passed in the image header).
  */
 unsigned long restore_jump_address __visible;
+unsigned long jump_address_phys;

 /*
  * Value of the cr3 register from before the hibernation (this value is passed
@@ -37,7 +39,43 @@ unsigned long restore_cr3 __visible;

 pgd_t *temp_level4_pgt __visible;

-void *relocated_restore_code __visible;
+unsigned long relocated_restore_code __visible;
+
+static int set_up_temporary_text_mapping(void)
+{
+   pmd_t *pmd;
+   pud_t *pud;
+
+   /*
+* The new mapping only has to cover the page containing the image
+* kernel's entry point (jump_address_phys), because the switch over to
+* it is carried out by relocated code running from a page allocated
+* specifically for this purpose and covered by the identity mapping, so
+* the temporary kernel text mapping is only needed for the final jump.
+* Moreover, in that mapping the virtual address of the image kernel's
+* entry point must be the same as its virtual address in the image
+* kernel (restore_jump_address), so the image kernel's
+* restore_registers() code doesn't find itself in a different area of
+* the virtual address space after switching over to the original page
+* tables used by the image kernel.
+*/
+   pud = (pud_t *)get_safe_page(GFP_ATOMIC);
+   if (!pud)
+   return -ENOMEM;
+
+   pmd = (pmd_t *)get_safe_page(GFP_ATOMIC);
+   if (!pmd)
+   return -ENOMEM;
+
+   set_pmd(pmd + pmd_index(restore_jump_address),
+   __pmd((jump_address_phys & PMD_MASK) | 
__PAGE_KERNEL_LARGE_EXEC));
+   set_pud(pud + pud_index(restore_jump_address),
+   __pud(__pa(pmd) | _KERNPG_TABLE));
+   set_pgd(temp_level4_pgt + pgd_index(restore_jump_address),
+   __pgd(__pa(pud) | _KERNPG_TABLE));
+
+   return 0;
+}

 static void *alloc_pgt_page(void *context)
 {
@@ -59,9 +97,10 @@ static int set_up_temporary_mappings(voi
if (!temp_level4_pgt)
return -ENOMEM;

-   /* It is safe to reuse the original kernel mapping */
-   set_pgd(temp_level4_pgt + pgd_index(__START_KERNEL_map),
-   init_level4_pgt[pgd_index(__START_KERNEL_map)]);
+   /* Prepare a temporary mapping for the kernel text */
+   result = set_up_temporary_text_mapping();
+   if (result)
+   return result;

/* Set up the direct mapping from scratch */
for (i = 0; i < nr_pfn_mapped; i++) {
@@ -78,19 +117,45 @@ static int set_up_temporary_mappings(voi
return 0;
 }

+static int relocate_restore_code(void)
+{
+   pgd_t *pgd;
+   pmd_t *pmd;
+
+   relocated_restore_code = get_safe_page(GFP_ATOMIC);
+   if (!relocated_restore_code)
+   return -ENOMEM;
+
+   memcpy((void *)relocated_restore_code, _restore_code, PAGE_SIZE);
+
+   /* Make the page containing the relocated code executable */
+   pgd = (pgd_t *)__va(read_cr3()) + pgd_index(relocated_restore_code);
+   pmd = pmd_offset(pud_offset(pgd, relocated_restore_code),
+relocated_restore_code);
+   if (pmd_large(*pmd)) {
+   set_pmd(pmd, __pmd(pmd_val(*pmd) & ~_PAGE_NX));
+   } else {
+   pte_t *pte = pte_offset_kernel(pmd, relocated_restore_code);
+
+   set_pte(pte, __pte(pte_val(*pte) & ~_PAGE_NX));
+   }
+   flush_tlb_all();
+
+   return 0;
+}
+
 int swsusp_arch_resume(void)
 {
int error;

/* We have got enough memory and from now on we cannot recover */
-   if ((error = 

Re: [PATCH v3] x86/power/64: Fix kernel text mapping corruption during image restoration

2016-06-29 Thread Logan Gunthorpe



On 29/06/16 08:55 PM, Rafael J. Wysocki wrote:


The only thing that comes to mind at this point is that TLBs should be flushed
after page tables changes, so please apply the appended and let me know
if you see this panic any more with it.



Ok, I'll build a new kernel tomorrow. But keep in mind the panic is 
pretty rare as I've only seen it once so far after a couple dozen or so 
hibernates. So it may be hard to get a concrete yes or no on whether it 
fixes the issue.


I've got a script to run a bunch of hibernates in a row. I usually only 
run it for a handful of iterations, but I'll try running it for much 
longer with this patch and let you know in a couple days.


Logan



Thanks,
Rafael


---
 arch/x86/power/hibernate_64.c |   92 +-
 arch/x86/power/hibernate_asm_64.S |   55 +-
 2 files changed, 104 insertions(+), 43 deletions(-)

Index: linux-pm/arch/x86/power/hibernate_64.c
===
--- linux-pm.orig/arch/x86/power/hibernate_64.c
+++ linux-pm/arch/x86/power/hibernate_64.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 

 /* Defined in hibernate_asm_64.S */
 extern asmlinkage __visible int restore_image(void);
@@ -28,6 +29,7 @@ extern asmlinkage __visible int restore_
  * kernel's text (this value is passed in the image header).
  */
 unsigned long restore_jump_address __visible;
+unsigned long jump_address_phys;

 /*
  * Value of the cr3 register from before the hibernation (this value is passed
@@ -37,7 +39,43 @@ unsigned long restore_cr3 __visible;

 pgd_t *temp_level4_pgt __visible;

-void *relocated_restore_code __visible;
+unsigned long relocated_restore_code __visible;
+
+static int set_up_temporary_text_mapping(void)
+{
+   pmd_t *pmd;
+   pud_t *pud;
+
+   /*
+* The new mapping only has to cover the page containing the image
+* kernel's entry point (jump_address_phys), because the switch over to
+* it is carried out by relocated code running from a page allocated
+* specifically for this purpose and covered by the identity mapping, so
+* the temporary kernel text mapping is only needed for the final jump.
+* Moreover, in that mapping the virtual address of the image kernel's
+* entry point must be the same as its virtual address in the image
+* kernel (restore_jump_address), so the image kernel's
+* restore_registers() code doesn't find itself in a different area of
+* the virtual address space after switching over to the original page
+* tables used by the image kernel.
+*/
+   pud = (pud_t *)get_safe_page(GFP_ATOMIC);
+   if (!pud)
+   return -ENOMEM;
+
+   pmd = (pmd_t *)get_safe_page(GFP_ATOMIC);
+   if (!pmd)
+   return -ENOMEM;
+
+   set_pmd(pmd + pmd_index(restore_jump_address),
+   __pmd((jump_address_phys & PMD_MASK) | 
__PAGE_KERNEL_LARGE_EXEC));
+   set_pud(pud + pud_index(restore_jump_address),
+   __pud(__pa(pmd) | _KERNPG_TABLE));
+   set_pgd(temp_level4_pgt + pgd_index(restore_jump_address),
+   __pgd(__pa(pud) | _KERNPG_TABLE));
+
+   return 0;
+}

 static void *alloc_pgt_page(void *context)
 {
@@ -59,9 +97,10 @@ static int set_up_temporary_mappings(voi
if (!temp_level4_pgt)
return -ENOMEM;

-   /* It is safe to reuse the original kernel mapping */
-   set_pgd(temp_level4_pgt + pgd_index(__START_KERNEL_map),
-   init_level4_pgt[pgd_index(__START_KERNEL_map)]);
+   /* Prepare a temporary mapping for the kernel text */
+   result = set_up_temporary_text_mapping();
+   if (result)
+   return result;

/* Set up the direct mapping from scratch */
for (i = 0; i < nr_pfn_mapped; i++) {
@@ -78,19 +117,45 @@ static int set_up_temporary_mappings(voi
return 0;
 }

+static int relocate_restore_code(void)
+{
+   pgd_t *pgd;
+   pmd_t *pmd;
+
+   relocated_restore_code = get_safe_page(GFP_ATOMIC);
+   if (!relocated_restore_code)
+   return -ENOMEM;
+
+   memcpy((void *)relocated_restore_code, _restore_code, PAGE_SIZE);
+
+   /* Make the page containing the relocated code executable */
+   pgd = (pgd_t *)__va(read_cr3()) + pgd_index(relocated_restore_code);
+   pmd = pmd_offset(pud_offset(pgd, relocated_restore_code),
+relocated_restore_code);
+   if (pmd_large(*pmd)) {
+   set_pmd(pmd, __pmd(pmd_val(*pmd) & ~_PAGE_NX));
+   } else {
+   pte_t *pte = pte_offset_kernel(pmd, relocated_restore_code);
+
+   set_pte(pte, __pte(pte_val(*pte) & ~_PAGE_NX));
+   }
+   flush_tlb_all();
+
+   return 0;
+}
+
 int swsusp_arch_resume(void)
 {
int error;

/* We have got enough memory and from now on we cannot recover */
-   if ((error = 

Re: [PATCH v3] x86/power/64: Fix kernel text mapping corruption during image restoration

2016-06-29 Thread Rafael J. Wysocki
On Thursday, June 30, 2016 04:20:43 AM Rafael J. Wysocki wrote:
> On Wednesday, June 29, 2016 07:52:18 PM Logan Gunthorpe wrote:
> > Hey Raf,
> > 
> > Sorry to report that although the patch works the majority of the time, 
> > I just got a suspicious kernel panic during resume.
> > 
> > It said:
> > 
> > "kernel tried to execute NX protected page - exploit attempt? (uid: 0)"
> > 
> > You can find a photo of the panic here:
> > 
> > http://staff.deltatee.com/~logang/panic.jpg
> 
> Thanks for the report!
> 
> That's not what Boris was seeing at least.
> 
> It looks like clearing the NX bit for relocated_restore_code in
> relocate_restore_code() didn't work for some reason.
> 
> I don't see why it may not work ATM, I need to have a fresh look at that
> tomorrow.
> 
> I had hoped to be able to fix this bug for 4.7, but it looks like it will
> miss the mark after all.  Oh well.

The only thing that comes to mind at this point is that TLBs should be flushed
after page tables changes, so please apply the appended and let me know
if you see this panic any more with it.

Thanks,
Rafael


---
 arch/x86/power/hibernate_64.c |   92 +-
 arch/x86/power/hibernate_asm_64.S |   55 +-
 2 files changed, 104 insertions(+), 43 deletions(-)

Index: linux-pm/arch/x86/power/hibernate_64.c
===
--- linux-pm.orig/arch/x86/power/hibernate_64.c
+++ linux-pm/arch/x86/power/hibernate_64.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /* Defined in hibernate_asm_64.S */
 extern asmlinkage __visible int restore_image(void);
@@ -28,6 +29,7 @@ extern asmlinkage __visible int restore_
  * kernel's text (this value is passed in the image header).
  */
 unsigned long restore_jump_address __visible;
+unsigned long jump_address_phys;
 
 /*
  * Value of the cr3 register from before the hibernation (this value is passed
@@ -37,7 +39,43 @@ unsigned long restore_cr3 __visible;
 
 pgd_t *temp_level4_pgt __visible;
 
-void *relocated_restore_code __visible;
+unsigned long relocated_restore_code __visible;
+
+static int set_up_temporary_text_mapping(void)
+{
+   pmd_t *pmd;
+   pud_t *pud;
+
+   /*
+* The new mapping only has to cover the page containing the image
+* kernel's entry point (jump_address_phys), because the switch over to
+* it is carried out by relocated code running from a page allocated
+* specifically for this purpose and covered by the identity mapping, so
+* the temporary kernel text mapping is only needed for the final jump.
+* Moreover, in that mapping the virtual address of the image kernel's
+* entry point must be the same as its virtual address in the image
+* kernel (restore_jump_address), so the image kernel's
+* restore_registers() code doesn't find itself in a different area of
+* the virtual address space after switching over to the original page
+* tables used by the image kernel.
+*/
+   pud = (pud_t *)get_safe_page(GFP_ATOMIC);
+   if (!pud)
+   return -ENOMEM;
+
+   pmd = (pmd_t *)get_safe_page(GFP_ATOMIC);
+   if (!pmd)
+   return -ENOMEM;
+
+   set_pmd(pmd + pmd_index(restore_jump_address),
+   __pmd((jump_address_phys & PMD_MASK) | 
__PAGE_KERNEL_LARGE_EXEC));
+   set_pud(pud + pud_index(restore_jump_address),
+   __pud(__pa(pmd) | _KERNPG_TABLE));
+   set_pgd(temp_level4_pgt + pgd_index(restore_jump_address),
+   __pgd(__pa(pud) | _KERNPG_TABLE));
+
+   return 0;
+}
 
 static void *alloc_pgt_page(void *context)
 {
@@ -59,9 +97,10 @@ static int set_up_temporary_mappings(voi
if (!temp_level4_pgt)
return -ENOMEM;
 
-   /* It is safe to reuse the original kernel mapping */
-   set_pgd(temp_level4_pgt + pgd_index(__START_KERNEL_map),
-   init_level4_pgt[pgd_index(__START_KERNEL_map)]);
+   /* Prepare a temporary mapping for the kernel text */
+   result = set_up_temporary_text_mapping();
+   if (result)
+   return result;
 
/* Set up the direct mapping from scratch */
for (i = 0; i < nr_pfn_mapped; i++) {
@@ -78,19 +117,45 @@ static int set_up_temporary_mappings(voi
return 0;
 }
 
+static int relocate_restore_code(void)
+{
+   pgd_t *pgd;
+   pmd_t *pmd;
+
+   relocated_restore_code = get_safe_page(GFP_ATOMIC);
+   if (!relocated_restore_code)
+   return -ENOMEM;
+
+   memcpy((void *)relocated_restore_code, _restore_code, PAGE_SIZE);
+
+   /* Make the page containing the relocated code executable */
+   pgd = (pgd_t *)__va(read_cr3()) + pgd_index(relocated_restore_code);
+   pmd = pmd_offset(pud_offset(pgd, relocated_restore_code),
+relocated_restore_code);
+   if (pmd_large(*pmd)) {
+   set_pmd(pmd, 

Re: [PATCH v3] x86/power/64: Fix kernel text mapping corruption during image restoration

2016-06-29 Thread Rafael J. Wysocki
On Thursday, June 30, 2016 04:20:43 AM Rafael J. Wysocki wrote:
> On Wednesday, June 29, 2016 07:52:18 PM Logan Gunthorpe wrote:
> > Hey Raf,
> > 
> > Sorry to report that although the patch works the majority of the time, 
> > I just got a suspicious kernel panic during resume.
> > 
> > It said:
> > 
> > "kernel tried to execute NX protected page - exploit attempt? (uid: 0)"
> > 
> > You can find a photo of the panic here:
> > 
> > http://staff.deltatee.com/~logang/panic.jpg
> 
> Thanks for the report!
> 
> That's not what Boris was seeing at least.
> 
> It looks like clearing the NX bit for relocated_restore_code in
> relocate_restore_code() didn't work for some reason.
> 
> I don't see why it may not work ATM, I need to have a fresh look at that
> tomorrow.
> 
> I had hoped to be able to fix this bug for 4.7, but it looks like it will
> miss the mark after all.  Oh well.

The only thing that comes to mind at this point is that TLBs should be flushed
after page tables changes, so please apply the appended and let me know
if you see this panic any more with it.

Thanks,
Rafael


---
 arch/x86/power/hibernate_64.c |   92 +-
 arch/x86/power/hibernate_asm_64.S |   55 +-
 2 files changed, 104 insertions(+), 43 deletions(-)

Index: linux-pm/arch/x86/power/hibernate_64.c
===
--- linux-pm.orig/arch/x86/power/hibernate_64.c
+++ linux-pm/arch/x86/power/hibernate_64.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /* Defined in hibernate_asm_64.S */
 extern asmlinkage __visible int restore_image(void);
@@ -28,6 +29,7 @@ extern asmlinkage __visible int restore_
  * kernel's text (this value is passed in the image header).
  */
 unsigned long restore_jump_address __visible;
+unsigned long jump_address_phys;
 
 /*
  * Value of the cr3 register from before the hibernation (this value is passed
@@ -37,7 +39,43 @@ unsigned long restore_cr3 __visible;
 
 pgd_t *temp_level4_pgt __visible;
 
-void *relocated_restore_code __visible;
+unsigned long relocated_restore_code __visible;
+
+static int set_up_temporary_text_mapping(void)
+{
+   pmd_t *pmd;
+   pud_t *pud;
+
+   /*
+* The new mapping only has to cover the page containing the image
+* kernel's entry point (jump_address_phys), because the switch over to
+* it is carried out by relocated code running from a page allocated
+* specifically for this purpose and covered by the identity mapping, so
+* the temporary kernel text mapping is only needed for the final jump.
+* Moreover, in that mapping the virtual address of the image kernel's
+* entry point must be the same as its virtual address in the image
+* kernel (restore_jump_address), so the image kernel's
+* restore_registers() code doesn't find itself in a different area of
+* the virtual address space after switching over to the original page
+* tables used by the image kernel.
+*/
+   pud = (pud_t *)get_safe_page(GFP_ATOMIC);
+   if (!pud)
+   return -ENOMEM;
+
+   pmd = (pmd_t *)get_safe_page(GFP_ATOMIC);
+   if (!pmd)
+   return -ENOMEM;
+
+   set_pmd(pmd + pmd_index(restore_jump_address),
+   __pmd((jump_address_phys & PMD_MASK) | 
__PAGE_KERNEL_LARGE_EXEC));
+   set_pud(pud + pud_index(restore_jump_address),
+   __pud(__pa(pmd) | _KERNPG_TABLE));
+   set_pgd(temp_level4_pgt + pgd_index(restore_jump_address),
+   __pgd(__pa(pud) | _KERNPG_TABLE));
+
+   return 0;
+}
 
 static void *alloc_pgt_page(void *context)
 {
@@ -59,9 +97,10 @@ static int set_up_temporary_mappings(voi
if (!temp_level4_pgt)
return -ENOMEM;
 
-   /* It is safe to reuse the original kernel mapping */
-   set_pgd(temp_level4_pgt + pgd_index(__START_KERNEL_map),
-   init_level4_pgt[pgd_index(__START_KERNEL_map)]);
+   /* Prepare a temporary mapping for the kernel text */
+   result = set_up_temporary_text_mapping();
+   if (result)
+   return result;
 
/* Set up the direct mapping from scratch */
for (i = 0; i < nr_pfn_mapped; i++) {
@@ -78,19 +117,45 @@ static int set_up_temporary_mappings(voi
return 0;
 }
 
+static int relocate_restore_code(void)
+{
+   pgd_t *pgd;
+   pmd_t *pmd;
+
+   relocated_restore_code = get_safe_page(GFP_ATOMIC);
+   if (!relocated_restore_code)
+   return -ENOMEM;
+
+   memcpy((void *)relocated_restore_code, _restore_code, PAGE_SIZE);
+
+   /* Make the page containing the relocated code executable */
+   pgd = (pgd_t *)__va(read_cr3()) + pgd_index(relocated_restore_code);
+   pmd = pmd_offset(pud_offset(pgd, relocated_restore_code),
+relocated_restore_code);
+   if (pmd_large(*pmd)) {
+   set_pmd(pmd, 

Re: [PATCH v3] x86/power/64: Fix kernel text mapping corruption during image restoration

2016-06-29 Thread Rafael J. Wysocki
On Wednesday, June 29, 2016 07:52:18 PM Logan Gunthorpe wrote:
> Hey Raf,
> 
> Sorry to report that although the patch works the majority of the time, 
> I just got a suspicious kernel panic during resume.
> 
> It said:
> 
> "kernel tried to execute NX protected page - exploit attempt? (uid: 0)"
> 
> You can find a photo of the panic here:
> 
> http://staff.deltatee.com/~logang/panic.jpg

Thanks for the report!

That's not what Boris was seeing at least.

It looks like clearing the NX bit for relocated_restore_code in
relocate_restore_code() didn't work for some reason.

I don't see why it may not work ATM, I need to have a fresh look at that
tomorrow.

I had hoped to be able to fix this bug for 4.7, but it looks like it will
miss the mark after all.  Oh well.

Thanks,
Rafael



Re: [PATCH v3] x86/power/64: Fix kernel text mapping corruption during image restoration

2016-06-29 Thread Rafael J. Wysocki
On Wednesday, June 29, 2016 07:52:18 PM Logan Gunthorpe wrote:
> Hey Raf,
> 
> Sorry to report that although the patch works the majority of the time, 
> I just got a suspicious kernel panic during resume.
> 
> It said:
> 
> "kernel tried to execute NX protected page - exploit attempt? (uid: 0)"
> 
> You can find a photo of the panic here:
> 
> http://staff.deltatee.com/~logang/panic.jpg

Thanks for the report!

That's not what Boris was seeing at least.

It looks like clearing the NX bit for relocated_restore_code in
relocate_restore_code() didn't work for some reason.

I don't see why it may not work ATM, I need to have a fresh look at that
tomorrow.

I had hoped to be able to fix this bug for 4.7, but it looks like it will
miss the mark after all.  Oh well.

Thanks,
Rafael



Re: [PATCH v3] x86/power/64: Fix kernel text mapping corruption during image restoration

2016-06-29 Thread Logan Gunthorpe

Hey Raf,

Sorry to report that although the patch works the majority of the time, 
I just got a suspicious kernel panic during resume.


It said:

"kernel tried to execute NX protected page - exploit attempt? (uid: 0)"

You can find a photo of the panic here:

http://staff.deltatee.com/~logang/panic.jpg

Logan


On 29/06/16 08:48 AM, Kees Cook wrote:

On Mon, Jun 27, 2016 at 4:33 PM, Logan Gunthorpe  wrote:

Hey,

This version is working for me as well. Thanks.

Logan

On 27/06/16 08:24 AM, Rafael J. Wysocki wrote:


On Tuesday, June 21, 2016 11:04:41 AM Kees Cook wrote:


On Mon, Jun 20, 2016 at 9:35 PM, Logan Gunthorpe 
wrote:


Hey Rafael,

This patch appears to be working on my laptop. Thanks.



Same for me: resume still works with KASLR in my tests too.



Unfortunately, Boris still sees post-resume memory corruption with the
patch
you have tested, but that turns out to be a result of some super-weird
corruption of a pointer on the stack which leads to a store at a wrong
address
(and there's no way it can be related to the rest of the patch).

We have verified that it can be avoided by rearranging
set_up_temporary_text_mapping()
to use fewer local variables and the appended patch contains this
modification.

I also went on and changed relocate_restore_code() slightly in a similar
fashion,
but all of those changes should not affect the behavior (unless there's
something
insane going on behind the curtains, that is).

Kees, Logan, Boris, please try this one and let me know if it works for
you.


Tested-by: Kees Cook 

I've done a few KASLR boots, and everything continues to look good to
me. Thanks!

-Kees



Thanks,
Rafael


---
From: Rafael J. Wysocki 
Subject: [PATCH v2] x86/power/64: Fix kernel text mapping corruption
during image restoration

Logan Gunthorpe reports that hibernation stopped working reliably for
him after commit ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table
and rodata).

That turns out to be a consequence of a long-standing issue with the
64-bit image restoration code on x86, which is that the temporary
page tables set up by it to avoid page tables corruption when the
last bits of the image kernel's memory contents are copied into
their original page frames re-use the boot kernel's text mapping,
but that mapping may very well get corrupted just like any other
part of the page tables.  Of course, if that happens, the final
jump to the image kernel's entry point will go to nowhere.

The exact reason why commit ab76f7b4ab23 matters here is that it
sometimes causes a PMD of a large page to be split into PTEs
that are allocated dynamically and get corrupted during image
restoration as described above.

To fix that issue note that the code copying the last bits of the
image kernel's memory contents to the page frames occupied by them
previoulsy doesn't use the kernel text mapping, because it runs from
a special page covered by the identity mapping set up for that code
from scratch.  Hence, the kernel text mapping is only needed before
that code starts to run and then it will only be used just for the
final jump to the image kernel's entry point.

Accordingly, the temporary page tables set up in swsusp_arch_resume()
on x86-64 need to contain the kernel text mapping too.  That mapping
is only going to be used for the final jump to the image kernel, so
it only needs to cover the image kernel's entry point, because the
first thing the image kernel does after getting control back is to
switch over to its own original page tables.  Moreover, the virtual
address of the image kernel's entry point in that mapping has to be
the same as the one mapped by the image kernel's page tables.

With that in mind, modify the x86-64's arch_hibernation_header_save()
and arch_hibernation_header_restore() routines to pass the physical
address of the image kernel's entry point (in addition to its virtual
address) to the boot kernel (a small piece of assembly code involved
in passing the entry point's virtual address to the image kernel is
not necessary any more after that, so drop it).  Update RESTORE_MAGIC
too to reflect the image header format change.

Next, in set_up_temporary_mappings(), use the physical and virtual
addresses of the image kernel's entry point passed in the image
header to set up a minimum kernel text mapping (using memory pages
that won't be overwritten by the image kernel's memory contents) that
will map those addresses to each other as appropriate.

This makes the concern about the possible corruption of the original
boot kernel text mapping go away and if the the minimum kernel text
mapping used for the final jump marks the image kernel's entry point
memory as executable, the jump to it is guaraneed to succeed.

Fixes: ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table and rodata)
Link: http://marc.info/?l=linux-pm=146372852823760=2
Reported-by: Logan Gunthorpe 

Re: [PATCH v3] x86/power/64: Fix kernel text mapping corruption during image restoration

2016-06-29 Thread Logan Gunthorpe

Hey Raf,

Sorry to report that although the patch works the majority of the time, 
I just got a suspicious kernel panic during resume.


It said:

"kernel tried to execute NX protected page - exploit attempt? (uid: 0)"

You can find a photo of the panic here:

http://staff.deltatee.com/~logang/panic.jpg

Logan


On 29/06/16 08:48 AM, Kees Cook wrote:

On Mon, Jun 27, 2016 at 4:33 PM, Logan Gunthorpe  wrote:

Hey,

This version is working for me as well. Thanks.

Logan

On 27/06/16 08:24 AM, Rafael J. Wysocki wrote:


On Tuesday, June 21, 2016 11:04:41 AM Kees Cook wrote:


On Mon, Jun 20, 2016 at 9:35 PM, Logan Gunthorpe 
wrote:


Hey Rafael,

This patch appears to be working on my laptop. Thanks.



Same for me: resume still works with KASLR in my tests too.



Unfortunately, Boris still sees post-resume memory corruption with the
patch
you have tested, but that turns out to be a result of some super-weird
corruption of a pointer on the stack which leads to a store at a wrong
address
(and there's no way it can be related to the rest of the patch).

We have verified that it can be avoided by rearranging
set_up_temporary_text_mapping()
to use fewer local variables and the appended patch contains this
modification.

I also went on and changed relocate_restore_code() slightly in a similar
fashion,
but all of those changes should not affect the behavior (unless there's
something
insane going on behind the curtains, that is).

Kees, Logan, Boris, please try this one and let me know if it works for
you.


Tested-by: Kees Cook 

I've done a few KASLR boots, and everything continues to look good to
me. Thanks!

-Kees



Thanks,
Rafael


---
From: Rafael J. Wysocki 
Subject: [PATCH v2] x86/power/64: Fix kernel text mapping corruption
during image restoration

Logan Gunthorpe reports that hibernation stopped working reliably for
him after commit ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table
and rodata).

That turns out to be a consequence of a long-standing issue with the
64-bit image restoration code on x86, which is that the temporary
page tables set up by it to avoid page tables corruption when the
last bits of the image kernel's memory contents are copied into
their original page frames re-use the boot kernel's text mapping,
but that mapping may very well get corrupted just like any other
part of the page tables.  Of course, if that happens, the final
jump to the image kernel's entry point will go to nowhere.

The exact reason why commit ab76f7b4ab23 matters here is that it
sometimes causes a PMD of a large page to be split into PTEs
that are allocated dynamically and get corrupted during image
restoration as described above.

To fix that issue note that the code copying the last bits of the
image kernel's memory contents to the page frames occupied by them
previoulsy doesn't use the kernel text mapping, because it runs from
a special page covered by the identity mapping set up for that code
from scratch.  Hence, the kernel text mapping is only needed before
that code starts to run and then it will only be used just for the
final jump to the image kernel's entry point.

Accordingly, the temporary page tables set up in swsusp_arch_resume()
on x86-64 need to contain the kernel text mapping too.  That mapping
is only going to be used for the final jump to the image kernel, so
it only needs to cover the image kernel's entry point, because the
first thing the image kernel does after getting control back is to
switch over to its own original page tables.  Moreover, the virtual
address of the image kernel's entry point in that mapping has to be
the same as the one mapped by the image kernel's page tables.

With that in mind, modify the x86-64's arch_hibernation_header_save()
and arch_hibernation_header_restore() routines to pass the physical
address of the image kernel's entry point (in addition to its virtual
address) to the boot kernel (a small piece of assembly code involved
in passing the entry point's virtual address to the image kernel is
not necessary any more after that, so drop it).  Update RESTORE_MAGIC
too to reflect the image header format change.

Next, in set_up_temporary_mappings(), use the physical and virtual
addresses of the image kernel's entry point passed in the image
header to set up a minimum kernel text mapping (using memory pages
that won't be overwritten by the image kernel's memory contents) that
will map those addresses to each other as appropriate.

This makes the concern about the possible corruption of the original
boot kernel text mapping go away and if the the minimum kernel text
mapping used for the final jump marks the image kernel's entry point
memory as executable, the jump to it is guaraneed to succeed.

Fixes: ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table and rodata)
Link: http://marc.info/?l=linux-pm=146372852823760=2
Reported-by: Logan Gunthorpe 
Signed-off-by: Rafael J. Wysocki 
---
 arch/x86/power/hibernate_64.c |   90

Re: [PATCH v3] x86/power/64: Fix kernel text mapping corruption during image restoration

2016-06-29 Thread Kees Cook
On Mon, Jun 27, 2016 at 4:33 PM, Logan Gunthorpe  wrote:
> Hey,
>
> This version is working for me as well. Thanks.
>
> Logan
>
> On 27/06/16 08:24 AM, Rafael J. Wysocki wrote:
>>
>> On Tuesday, June 21, 2016 11:04:41 AM Kees Cook wrote:
>>>
>>> On Mon, Jun 20, 2016 at 9:35 PM, Logan Gunthorpe 
>>> wrote:

 Hey Rafael,

 This patch appears to be working on my laptop. Thanks.
>>>
>>>
>>> Same for me: resume still works with KASLR in my tests too.
>>
>>
>> Unfortunately, Boris still sees post-resume memory corruption with the
>> patch
>> you have tested, but that turns out to be a result of some super-weird
>> corruption of a pointer on the stack which leads to a store at a wrong
>> address
>> (and there's no way it can be related to the rest of the patch).
>>
>> We have verified that it can be avoided by rearranging
>> set_up_temporary_text_mapping()
>> to use fewer local variables and the appended patch contains this
>> modification.
>>
>> I also went on and changed relocate_restore_code() slightly in a similar
>> fashion,
>> but all of those changes should not affect the behavior (unless there's
>> something
>> insane going on behind the curtains, that is).
>>
>> Kees, Logan, Boris, please try this one and let me know if it works for
>> you.

Tested-by: Kees Cook 

I've done a few KASLR boots, and everything continues to look good to
me. Thanks!

-Kees

>>
>> Thanks,
>> Rafael
>>
>>
>> ---
>> From: Rafael J. Wysocki 
>> Subject: [PATCH v2] x86/power/64: Fix kernel text mapping corruption
>> during image restoration
>>
>> Logan Gunthorpe reports that hibernation stopped working reliably for
>> him after commit ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table
>> and rodata).
>>
>> That turns out to be a consequence of a long-standing issue with the
>> 64-bit image restoration code on x86, which is that the temporary
>> page tables set up by it to avoid page tables corruption when the
>> last bits of the image kernel's memory contents are copied into
>> their original page frames re-use the boot kernel's text mapping,
>> but that mapping may very well get corrupted just like any other
>> part of the page tables.  Of course, if that happens, the final
>> jump to the image kernel's entry point will go to nowhere.
>>
>> The exact reason why commit ab76f7b4ab23 matters here is that it
>> sometimes causes a PMD of a large page to be split into PTEs
>> that are allocated dynamically and get corrupted during image
>> restoration as described above.
>>
>> To fix that issue note that the code copying the last bits of the
>> image kernel's memory contents to the page frames occupied by them
>> previoulsy doesn't use the kernel text mapping, because it runs from
>> a special page covered by the identity mapping set up for that code
>> from scratch.  Hence, the kernel text mapping is only needed before
>> that code starts to run and then it will only be used just for the
>> final jump to the image kernel's entry point.
>>
>> Accordingly, the temporary page tables set up in swsusp_arch_resume()
>> on x86-64 need to contain the kernel text mapping too.  That mapping
>> is only going to be used for the final jump to the image kernel, so
>> it only needs to cover the image kernel's entry point, because the
>> first thing the image kernel does after getting control back is to
>> switch over to its own original page tables.  Moreover, the virtual
>> address of the image kernel's entry point in that mapping has to be
>> the same as the one mapped by the image kernel's page tables.
>>
>> With that in mind, modify the x86-64's arch_hibernation_header_save()
>> and arch_hibernation_header_restore() routines to pass the physical
>> address of the image kernel's entry point (in addition to its virtual
>> address) to the boot kernel (a small piece of assembly code involved
>> in passing the entry point's virtual address to the image kernel is
>> not necessary any more after that, so drop it).  Update RESTORE_MAGIC
>> too to reflect the image header format change.
>>
>> Next, in set_up_temporary_mappings(), use the physical and virtual
>> addresses of the image kernel's entry point passed in the image
>> header to set up a minimum kernel text mapping (using memory pages
>> that won't be overwritten by the image kernel's memory contents) that
>> will map those addresses to each other as appropriate.
>>
>> This makes the concern about the possible corruption of the original
>> boot kernel text mapping go away and if the the minimum kernel text
>> mapping used for the final jump marks the image kernel's entry point
>> memory as executable, the jump to it is guaraneed to succeed.
>>
>> Fixes: ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table and rodata)
>> Link: http://marc.info/?l=linux-pm=146372852823760=2
>> Reported-by: Logan Gunthorpe 
>> Signed-off-by: Rafael J. Wysocki 

Re: [PATCH v3] x86/power/64: Fix kernel text mapping corruption during image restoration

2016-06-29 Thread Kees Cook
On Mon, Jun 27, 2016 at 4:33 PM, Logan Gunthorpe  wrote:
> Hey,
>
> This version is working for me as well. Thanks.
>
> Logan
>
> On 27/06/16 08:24 AM, Rafael J. Wysocki wrote:
>>
>> On Tuesday, June 21, 2016 11:04:41 AM Kees Cook wrote:
>>>
>>> On Mon, Jun 20, 2016 at 9:35 PM, Logan Gunthorpe 
>>> wrote:

 Hey Rafael,

 This patch appears to be working on my laptop. Thanks.
>>>
>>>
>>> Same for me: resume still works with KASLR in my tests too.
>>
>>
>> Unfortunately, Boris still sees post-resume memory corruption with the
>> patch
>> you have tested, but that turns out to be a result of some super-weird
>> corruption of a pointer on the stack which leads to a store at a wrong
>> address
>> (and there's no way it can be related to the rest of the patch).
>>
>> We have verified that it can be avoided by rearranging
>> set_up_temporary_text_mapping()
>> to use fewer local variables and the appended patch contains this
>> modification.
>>
>> I also went on and changed relocate_restore_code() slightly in a similar
>> fashion,
>> but all of those changes should not affect the behavior (unless there's
>> something
>> insane going on behind the curtains, that is).
>>
>> Kees, Logan, Boris, please try this one and let me know if it works for
>> you.

Tested-by: Kees Cook 

I've done a few KASLR boots, and everything continues to look good to
me. Thanks!

-Kees

>>
>> Thanks,
>> Rafael
>>
>>
>> ---
>> From: Rafael J. Wysocki 
>> Subject: [PATCH v2] x86/power/64: Fix kernel text mapping corruption
>> during image restoration
>>
>> Logan Gunthorpe reports that hibernation stopped working reliably for
>> him after commit ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table
>> and rodata).
>>
>> That turns out to be a consequence of a long-standing issue with the
>> 64-bit image restoration code on x86, which is that the temporary
>> page tables set up by it to avoid page tables corruption when the
>> last bits of the image kernel's memory contents are copied into
>> their original page frames re-use the boot kernel's text mapping,
>> but that mapping may very well get corrupted just like any other
>> part of the page tables.  Of course, if that happens, the final
>> jump to the image kernel's entry point will go to nowhere.
>>
>> The exact reason why commit ab76f7b4ab23 matters here is that it
>> sometimes causes a PMD of a large page to be split into PTEs
>> that are allocated dynamically and get corrupted during image
>> restoration as described above.
>>
>> To fix that issue note that the code copying the last bits of the
>> image kernel's memory contents to the page frames occupied by them
>> previoulsy doesn't use the kernel text mapping, because it runs from
>> a special page covered by the identity mapping set up for that code
>> from scratch.  Hence, the kernel text mapping is only needed before
>> that code starts to run and then it will only be used just for the
>> final jump to the image kernel's entry point.
>>
>> Accordingly, the temporary page tables set up in swsusp_arch_resume()
>> on x86-64 need to contain the kernel text mapping too.  That mapping
>> is only going to be used for the final jump to the image kernel, so
>> it only needs to cover the image kernel's entry point, because the
>> first thing the image kernel does after getting control back is to
>> switch over to its own original page tables.  Moreover, the virtual
>> address of the image kernel's entry point in that mapping has to be
>> the same as the one mapped by the image kernel's page tables.
>>
>> With that in mind, modify the x86-64's arch_hibernation_header_save()
>> and arch_hibernation_header_restore() routines to pass the physical
>> address of the image kernel's entry point (in addition to its virtual
>> address) to the boot kernel (a small piece of assembly code involved
>> in passing the entry point's virtual address to the image kernel is
>> not necessary any more after that, so drop it).  Update RESTORE_MAGIC
>> too to reflect the image header format change.
>>
>> Next, in set_up_temporary_mappings(), use the physical and virtual
>> addresses of the image kernel's entry point passed in the image
>> header to set up a minimum kernel text mapping (using memory pages
>> that won't be overwritten by the image kernel's memory contents) that
>> will map those addresses to each other as appropriate.
>>
>> This makes the concern about the possible corruption of the original
>> boot kernel text mapping go away and if the the minimum kernel text
>> mapping used for the final jump marks the image kernel's entry point
>> memory as executable, the jump to it is guaraneed to succeed.
>>
>> Fixes: ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table and rodata)
>> Link: http://marc.info/?l=linux-pm=146372852823760=2
>> Reported-by: Logan Gunthorpe 
>> Signed-off-by: Rafael J. Wysocki 
>> ---
>>  arch/x86/power/hibernate_64.c |   90
>> --
>>  arch/x86/power/hibernate_asm_64.S |   

Re: [PATCH v3] x86/power/64: Fix kernel text mapping corruption during image restoration

2016-06-27 Thread Logan Gunthorpe

Hey,

This version is working for me as well. Thanks.

Logan

On 27/06/16 08:24 AM, Rafael J. Wysocki wrote:

On Tuesday, June 21, 2016 11:04:41 AM Kees Cook wrote:

On Mon, Jun 20, 2016 at 9:35 PM, Logan Gunthorpe  wrote:

Hey Rafael,

This patch appears to be working on my laptop. Thanks.


Same for me: resume still works with KASLR in my tests too.


Unfortunately, Boris still sees post-resume memory corruption with the patch
you have tested, but that turns out to be a result of some super-weird
corruption of a pointer on the stack which leads to a store at a wrong address
(and there's no way it can be related to the rest of the patch).

We have verified that it can be avoided by rearranging 
set_up_temporary_text_mapping()
to use fewer local variables and the appended patch contains this modification.

I also went on and changed relocate_restore_code() slightly in a similar 
fashion,
but all of those changes should not affect the behavior (unless there's 
something
insane going on behind the curtains, that is).

Kees, Logan, Boris, please try this one and let me know if it works for you.

Thanks,
Rafael


---
From: Rafael J. Wysocki 
Subject: [PATCH v2] x86/power/64: Fix kernel text mapping corruption during 
image restoration

Logan Gunthorpe reports that hibernation stopped working reliably for
him after commit ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table
and rodata).

That turns out to be a consequence of a long-standing issue with the
64-bit image restoration code on x86, which is that the temporary
page tables set up by it to avoid page tables corruption when the
last bits of the image kernel's memory contents are copied into
their original page frames re-use the boot kernel's text mapping,
but that mapping may very well get corrupted just like any other
part of the page tables.  Of course, if that happens, the final
jump to the image kernel's entry point will go to nowhere.

The exact reason why commit ab76f7b4ab23 matters here is that it
sometimes causes a PMD of a large page to be split into PTEs
that are allocated dynamically and get corrupted during image
restoration as described above.

To fix that issue note that the code copying the last bits of the
image kernel's memory contents to the page frames occupied by them
previoulsy doesn't use the kernel text mapping, because it runs from
a special page covered by the identity mapping set up for that code
from scratch.  Hence, the kernel text mapping is only needed before
that code starts to run and then it will only be used just for the
final jump to the image kernel's entry point.

Accordingly, the temporary page tables set up in swsusp_arch_resume()
on x86-64 need to contain the kernel text mapping too.  That mapping
is only going to be used for the final jump to the image kernel, so
it only needs to cover the image kernel's entry point, because the
first thing the image kernel does after getting control back is to
switch over to its own original page tables.  Moreover, the virtual
address of the image kernel's entry point in that mapping has to be
the same as the one mapped by the image kernel's page tables.

With that in mind, modify the x86-64's arch_hibernation_header_save()
and arch_hibernation_header_restore() routines to pass the physical
address of the image kernel's entry point (in addition to its virtual
address) to the boot kernel (a small piece of assembly code involved
in passing the entry point's virtual address to the image kernel is
not necessary any more after that, so drop it).  Update RESTORE_MAGIC
too to reflect the image header format change.

Next, in set_up_temporary_mappings(), use the physical and virtual
addresses of the image kernel's entry point passed in the image
header to set up a minimum kernel text mapping (using memory pages
that won't be overwritten by the image kernel's memory contents) that
will map those addresses to each other as appropriate.

This makes the concern about the possible corruption of the original
boot kernel text mapping go away and if the the minimum kernel text
mapping used for the final jump marks the image kernel's entry point
memory as executable, the jump to it is guaraneed to succeed.

Fixes: ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table and rodata)
Link: http://marc.info/?l=linux-pm=146372852823760=2
Reported-by: Logan Gunthorpe 
Signed-off-by: Rafael J. Wysocki 
---
 arch/x86/power/hibernate_64.c |   90 --
 arch/x86/power/hibernate_asm_64.S |   55 ++-
 2 files changed, 102 insertions(+), 43 deletions(-)

Index: linux-pm/arch/x86/power/hibernate_64.c
===
--- linux-pm.orig/arch/x86/power/hibernate_64.c
+++ linux-pm/arch/x86/power/hibernate_64.c
@@ -28,6 +28,7 @@ extern asmlinkage __visible int restore_
  * kernel's text (this value is passed in 

Re: [PATCH v3] x86/power/64: Fix kernel text mapping corruption during image restoration

2016-06-27 Thread Logan Gunthorpe

Hey,

This version is working for me as well. Thanks.

Logan

On 27/06/16 08:24 AM, Rafael J. Wysocki wrote:

On Tuesday, June 21, 2016 11:04:41 AM Kees Cook wrote:

On Mon, Jun 20, 2016 at 9:35 PM, Logan Gunthorpe  wrote:

Hey Rafael,

This patch appears to be working on my laptop. Thanks.


Same for me: resume still works with KASLR in my tests too.


Unfortunately, Boris still sees post-resume memory corruption with the patch
you have tested, but that turns out to be a result of some super-weird
corruption of a pointer on the stack which leads to a store at a wrong address
(and there's no way it can be related to the rest of the patch).

We have verified that it can be avoided by rearranging 
set_up_temporary_text_mapping()
to use fewer local variables and the appended patch contains this modification.

I also went on and changed relocate_restore_code() slightly in a similar 
fashion,
but all of those changes should not affect the behavior (unless there's 
something
insane going on behind the curtains, that is).

Kees, Logan, Boris, please try this one and let me know if it works for you.

Thanks,
Rafael


---
From: Rafael J. Wysocki 
Subject: [PATCH v2] x86/power/64: Fix kernel text mapping corruption during 
image restoration

Logan Gunthorpe reports that hibernation stopped working reliably for
him after commit ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table
and rodata).

That turns out to be a consequence of a long-standing issue with the
64-bit image restoration code on x86, which is that the temporary
page tables set up by it to avoid page tables corruption when the
last bits of the image kernel's memory contents are copied into
their original page frames re-use the boot kernel's text mapping,
but that mapping may very well get corrupted just like any other
part of the page tables.  Of course, if that happens, the final
jump to the image kernel's entry point will go to nowhere.

The exact reason why commit ab76f7b4ab23 matters here is that it
sometimes causes a PMD of a large page to be split into PTEs
that are allocated dynamically and get corrupted during image
restoration as described above.

To fix that issue note that the code copying the last bits of the
image kernel's memory contents to the page frames occupied by them
previoulsy doesn't use the kernel text mapping, because it runs from
a special page covered by the identity mapping set up for that code
from scratch.  Hence, the kernel text mapping is only needed before
that code starts to run and then it will only be used just for the
final jump to the image kernel's entry point.

Accordingly, the temporary page tables set up in swsusp_arch_resume()
on x86-64 need to contain the kernel text mapping too.  That mapping
is only going to be used for the final jump to the image kernel, so
it only needs to cover the image kernel's entry point, because the
first thing the image kernel does after getting control back is to
switch over to its own original page tables.  Moreover, the virtual
address of the image kernel's entry point in that mapping has to be
the same as the one mapped by the image kernel's page tables.

With that in mind, modify the x86-64's arch_hibernation_header_save()
and arch_hibernation_header_restore() routines to pass the physical
address of the image kernel's entry point (in addition to its virtual
address) to the boot kernel (a small piece of assembly code involved
in passing the entry point's virtual address to the image kernel is
not necessary any more after that, so drop it).  Update RESTORE_MAGIC
too to reflect the image header format change.

Next, in set_up_temporary_mappings(), use the physical and virtual
addresses of the image kernel's entry point passed in the image
header to set up a minimum kernel text mapping (using memory pages
that won't be overwritten by the image kernel's memory contents) that
will map those addresses to each other as appropriate.

This makes the concern about the possible corruption of the original
boot kernel text mapping go away and if the the minimum kernel text
mapping used for the final jump marks the image kernel's entry point
memory as executable, the jump to it is guaraneed to succeed.

Fixes: ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table and rodata)
Link: http://marc.info/?l=linux-pm=146372852823760=2
Reported-by: Logan Gunthorpe 
Signed-off-by: Rafael J. Wysocki 
---
 arch/x86/power/hibernate_64.c |   90 --
 arch/x86/power/hibernate_asm_64.S |   55 ++-
 2 files changed, 102 insertions(+), 43 deletions(-)

Index: linux-pm/arch/x86/power/hibernate_64.c
===
--- linux-pm.orig/arch/x86/power/hibernate_64.c
+++ linux-pm/arch/x86/power/hibernate_64.c
@@ -28,6 +28,7 @@ extern asmlinkage __visible int restore_
  * kernel's text (this value is passed in the image header).
  */
 unsigned long restore_jump_address __visible;
+unsigned long 

Re: [PATCH v3] x86/power/64: Fix kernel text mapping corruption during image restoration (was: Re: ktime_get_ts64() splat during resume)

2016-06-27 Thread Borislav Petkov
On Mon, Jun 27, 2016 at 04:24:22PM +0200, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki 
> Subject: [PATCH v2] x86/power/64: Fix kernel text mapping corruption during 
> image restoration
> 
> Logan Gunthorpe reports that hibernation stopped working reliably for
> him after commit ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table
> and rodata).
> 
> That turns out to be a consequence of a long-standing issue with the
> 64-bit image restoration code on x86, which is that the temporary
> page tables set up by it to avoid page tables corruption when the
> last bits of the image kernel's memory contents are copied into
> their original page frames re-use the boot kernel's text mapping,
> but that mapping may very well get corrupted just like any other
> part of the page tables.  Of course, if that happens, the final
> jump to the image kernel's entry point will go to nowhere.
> 
> The exact reason why commit ab76f7b4ab23 matters here is that it
> sometimes causes a PMD of a large page to be split into PTEs
> that are allocated dynamically and get corrupted during image
> restoration as described above.
> 
> To fix that issue note that the code copying the last bits of the
> image kernel's memory contents to the page frames occupied by them
> previoulsy doesn't use the kernel text mapping, because it runs from
> a special page covered by the identity mapping set up for that code
> from scratch.  Hence, the kernel text mapping is only needed before
> that code starts to run and then it will only be used just for the
> final jump to the image kernel's entry point.
> 
> Accordingly, the temporary page tables set up in swsusp_arch_resume()
> on x86-64 need to contain the kernel text mapping too.  That mapping
> is only going to be used for the final jump to the image kernel, so
> it only needs to cover the image kernel's entry point, because the
> first thing the image kernel does after getting control back is to
> switch over to its own original page tables.  Moreover, the virtual
> address of the image kernel's entry point in that mapping has to be
> the same as the one mapped by the image kernel's page tables.
> 
> With that in mind, modify the x86-64's arch_hibernation_header_save()
> and arch_hibernation_header_restore() routines to pass the physical
> address of the image kernel's entry point (in addition to its virtual
> address) to the boot kernel (a small piece of assembly code involved
> in passing the entry point's virtual address to the image kernel is
> not necessary any more after that, so drop it).  Update RESTORE_MAGIC
> too to reflect the image header format change.
> 
> Next, in set_up_temporary_mappings(), use the physical and virtual
> addresses of the image kernel's entry point passed in the image
> header to set up a minimum kernel text mapping (using memory pages
> that won't be overwritten by the image kernel's memory contents) that
> will map those addresses to each other as appropriate.
> 
> This makes the concern about the possible corruption of the original
> boot kernel text mapping go away and if the the minimum kernel text
> mapping used for the final jump marks the image kernel's entry point
> memory as executable, the jump to it is guaraneed to succeed.
> 
> Fixes: ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table and rodata)
> Link: http://marc.info/?l=linux-pm=146372852823760=2
> Reported-by: Logan Gunthorpe 
> Signed-off-by: Rafael J. Wysocki 
> ---
>  arch/x86/power/hibernate_64.c |   90 
> --
>  arch/x86/power/hibernate_asm_64.S |   55 ++-
>  2 files changed, 102 insertions(+), 43 deletions(-)

Reported-and-tested-by: Borislav Petkov 

-- 
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.


Re: [PATCH v3] x86/power/64: Fix kernel text mapping corruption during image restoration (was: Re: ktime_get_ts64() splat during resume)

2016-06-27 Thread Borislav Petkov
On Mon, Jun 27, 2016 at 04:24:22PM +0200, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki 
> Subject: [PATCH v2] x86/power/64: Fix kernel text mapping corruption during 
> image restoration
> 
> Logan Gunthorpe reports that hibernation stopped working reliably for
> him after commit ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table
> and rodata).
> 
> That turns out to be a consequence of a long-standing issue with the
> 64-bit image restoration code on x86, which is that the temporary
> page tables set up by it to avoid page tables corruption when the
> last bits of the image kernel's memory contents are copied into
> their original page frames re-use the boot kernel's text mapping,
> but that mapping may very well get corrupted just like any other
> part of the page tables.  Of course, if that happens, the final
> jump to the image kernel's entry point will go to nowhere.
> 
> The exact reason why commit ab76f7b4ab23 matters here is that it
> sometimes causes a PMD of a large page to be split into PTEs
> that are allocated dynamically and get corrupted during image
> restoration as described above.
> 
> To fix that issue note that the code copying the last bits of the
> image kernel's memory contents to the page frames occupied by them
> previoulsy doesn't use the kernel text mapping, because it runs from
> a special page covered by the identity mapping set up for that code
> from scratch.  Hence, the kernel text mapping is only needed before
> that code starts to run and then it will only be used just for the
> final jump to the image kernel's entry point.
> 
> Accordingly, the temporary page tables set up in swsusp_arch_resume()
> on x86-64 need to contain the kernel text mapping too.  That mapping
> is only going to be used for the final jump to the image kernel, so
> it only needs to cover the image kernel's entry point, because the
> first thing the image kernel does after getting control back is to
> switch over to its own original page tables.  Moreover, the virtual
> address of the image kernel's entry point in that mapping has to be
> the same as the one mapped by the image kernel's page tables.
> 
> With that in mind, modify the x86-64's arch_hibernation_header_save()
> and arch_hibernation_header_restore() routines to pass the physical
> address of the image kernel's entry point (in addition to its virtual
> address) to the boot kernel (a small piece of assembly code involved
> in passing the entry point's virtual address to the image kernel is
> not necessary any more after that, so drop it).  Update RESTORE_MAGIC
> too to reflect the image header format change.
> 
> Next, in set_up_temporary_mappings(), use the physical and virtual
> addresses of the image kernel's entry point passed in the image
> header to set up a minimum kernel text mapping (using memory pages
> that won't be overwritten by the image kernel's memory contents) that
> will map those addresses to each other as appropriate.
> 
> This makes the concern about the possible corruption of the original
> boot kernel text mapping go away and if the the minimum kernel text
> mapping used for the final jump marks the image kernel's entry point
> memory as executable, the jump to it is guaraneed to succeed.
> 
> Fixes: ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table and rodata)
> Link: http://marc.info/?l=linux-pm=146372852823760=2
> Reported-by: Logan Gunthorpe 
> Signed-off-by: Rafael J. Wysocki 
> ---
>  arch/x86/power/hibernate_64.c |   90 
> --
>  arch/x86/power/hibernate_asm_64.S |   55 ++-
>  2 files changed, 102 insertions(+), 43 deletions(-)

Reported-and-tested-by: Borislav Petkov 

-- 
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.


[PATCH v3] x86/power/64: Fix kernel text mapping corruption during image restoration (was: Re: ktime_get_ts64() splat during resume)

2016-06-27 Thread Rafael J. Wysocki
On Tuesday, June 21, 2016 11:04:41 AM Kees Cook wrote:
> On Mon, Jun 20, 2016 at 9:35 PM, Logan Gunthorpe  wrote:
> > Hey Rafael,
> >
> > This patch appears to be working on my laptop. Thanks.
> 
> Same for me: resume still works with KASLR in my tests too.

Unfortunately, Boris still sees post-resume memory corruption with the patch
you have tested, but that turns out to be a result of some super-weird
corruption of a pointer on the stack which leads to a store at a wrong address
(and there's no way it can be related to the rest of the patch).

We have verified that it can be avoided by rearranging 
set_up_temporary_text_mapping()
to use fewer local variables and the appended patch contains this modification.

I also went on and changed relocate_restore_code() slightly in a similar 
fashion,
but all of those changes should not affect the behavior (unless there's 
something
insane going on behind the curtains, that is).

Kees, Logan, Boris, please try this one and let me know if it works for you.

Thanks,
Rafael


---
From: Rafael J. Wysocki 
Subject: [PATCH v2] x86/power/64: Fix kernel text mapping corruption during 
image restoration

Logan Gunthorpe reports that hibernation stopped working reliably for
him after commit ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table
and rodata).

That turns out to be a consequence of a long-standing issue with the
64-bit image restoration code on x86, which is that the temporary
page tables set up by it to avoid page tables corruption when the
last bits of the image kernel's memory contents are copied into
their original page frames re-use the boot kernel's text mapping,
but that mapping may very well get corrupted just like any other
part of the page tables.  Of course, if that happens, the final
jump to the image kernel's entry point will go to nowhere.

The exact reason why commit ab76f7b4ab23 matters here is that it
sometimes causes a PMD of a large page to be split into PTEs
that are allocated dynamically and get corrupted during image
restoration as described above.

To fix that issue note that the code copying the last bits of the
image kernel's memory contents to the page frames occupied by them
previoulsy doesn't use the kernel text mapping, because it runs from
a special page covered by the identity mapping set up for that code
from scratch.  Hence, the kernel text mapping is only needed before
that code starts to run and then it will only be used just for the
final jump to the image kernel's entry point.

Accordingly, the temporary page tables set up in swsusp_arch_resume()
on x86-64 need to contain the kernel text mapping too.  That mapping
is only going to be used for the final jump to the image kernel, so
it only needs to cover the image kernel's entry point, because the
first thing the image kernel does after getting control back is to
switch over to its own original page tables.  Moreover, the virtual
address of the image kernel's entry point in that mapping has to be
the same as the one mapped by the image kernel's page tables.

With that in mind, modify the x86-64's arch_hibernation_header_save()
and arch_hibernation_header_restore() routines to pass the physical
address of the image kernel's entry point (in addition to its virtual
address) to the boot kernel (a small piece of assembly code involved
in passing the entry point's virtual address to the image kernel is
not necessary any more after that, so drop it).  Update RESTORE_MAGIC
too to reflect the image header format change.

Next, in set_up_temporary_mappings(), use the physical and virtual
addresses of the image kernel's entry point passed in the image
header to set up a minimum kernel text mapping (using memory pages
that won't be overwritten by the image kernel's memory contents) that
will map those addresses to each other as appropriate.

This makes the concern about the possible corruption of the original
boot kernel text mapping go away and if the the minimum kernel text
mapping used for the final jump marks the image kernel's entry point
memory as executable, the jump to it is guaraneed to succeed.

Fixes: ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table and rodata)
Link: http://marc.info/?l=linux-pm=146372852823760=2
Reported-by: Logan Gunthorpe 
Signed-off-by: Rafael J. Wysocki 
---
 arch/x86/power/hibernate_64.c |   90 --
 arch/x86/power/hibernate_asm_64.S |   55 ++-
 2 files changed, 102 insertions(+), 43 deletions(-)

Index: linux-pm/arch/x86/power/hibernate_64.c
===
--- linux-pm.orig/arch/x86/power/hibernate_64.c
+++ linux-pm/arch/x86/power/hibernate_64.c
@@ -28,6 +28,7 @@ extern asmlinkage __visible int restore_
  * kernel's text (this value is passed in the image header).
  */
 unsigned long restore_jump_address __visible;
+unsigned long 

[PATCH v3] x86/power/64: Fix kernel text mapping corruption during image restoration (was: Re: ktime_get_ts64() splat during resume)

2016-06-27 Thread Rafael J. Wysocki
On Tuesday, June 21, 2016 11:04:41 AM Kees Cook wrote:
> On Mon, Jun 20, 2016 at 9:35 PM, Logan Gunthorpe  wrote:
> > Hey Rafael,
> >
> > This patch appears to be working on my laptop. Thanks.
> 
> Same for me: resume still works with KASLR in my tests too.

Unfortunately, Boris still sees post-resume memory corruption with the patch
you have tested, but that turns out to be a result of some super-weird
corruption of a pointer on the stack which leads to a store at a wrong address
(and there's no way it can be related to the rest of the patch).

We have verified that it can be avoided by rearranging 
set_up_temporary_text_mapping()
to use fewer local variables and the appended patch contains this modification.

I also went on and changed relocate_restore_code() slightly in a similar 
fashion,
but all of those changes should not affect the behavior (unless there's 
something
insane going on behind the curtains, that is).

Kees, Logan, Boris, please try this one and let me know if it works for you.

Thanks,
Rafael


---
From: Rafael J. Wysocki 
Subject: [PATCH v2] x86/power/64: Fix kernel text mapping corruption during 
image restoration

Logan Gunthorpe reports that hibernation stopped working reliably for
him after commit ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table
and rodata).

That turns out to be a consequence of a long-standing issue with the
64-bit image restoration code on x86, which is that the temporary
page tables set up by it to avoid page tables corruption when the
last bits of the image kernel's memory contents are copied into
their original page frames re-use the boot kernel's text mapping,
but that mapping may very well get corrupted just like any other
part of the page tables.  Of course, if that happens, the final
jump to the image kernel's entry point will go to nowhere.

The exact reason why commit ab76f7b4ab23 matters here is that it
sometimes causes a PMD of a large page to be split into PTEs
that are allocated dynamically and get corrupted during image
restoration as described above.

To fix that issue note that the code copying the last bits of the
image kernel's memory contents to the page frames occupied by them
previoulsy doesn't use the kernel text mapping, because it runs from
a special page covered by the identity mapping set up for that code
from scratch.  Hence, the kernel text mapping is only needed before
that code starts to run and then it will only be used just for the
final jump to the image kernel's entry point.

Accordingly, the temporary page tables set up in swsusp_arch_resume()
on x86-64 need to contain the kernel text mapping too.  That mapping
is only going to be used for the final jump to the image kernel, so
it only needs to cover the image kernel's entry point, because the
first thing the image kernel does after getting control back is to
switch over to its own original page tables.  Moreover, the virtual
address of the image kernel's entry point in that mapping has to be
the same as the one mapped by the image kernel's page tables.

With that in mind, modify the x86-64's arch_hibernation_header_save()
and arch_hibernation_header_restore() routines to pass the physical
address of the image kernel's entry point (in addition to its virtual
address) to the boot kernel (a small piece of assembly code involved
in passing the entry point's virtual address to the image kernel is
not necessary any more after that, so drop it).  Update RESTORE_MAGIC
too to reflect the image header format change.

Next, in set_up_temporary_mappings(), use the physical and virtual
addresses of the image kernel's entry point passed in the image
header to set up a minimum kernel text mapping (using memory pages
that won't be overwritten by the image kernel's memory contents) that
will map those addresses to each other as appropriate.

This makes the concern about the possible corruption of the original
boot kernel text mapping go away and if the the minimum kernel text
mapping used for the final jump marks the image kernel's entry point
memory as executable, the jump to it is guaraneed to succeed.

Fixes: ab76f7b4ab23 (x86/mm: Set NX on gap between __ex_table and rodata)
Link: http://marc.info/?l=linux-pm=146372852823760=2
Reported-by: Logan Gunthorpe 
Signed-off-by: Rafael J. Wysocki 
---
 arch/x86/power/hibernate_64.c |   90 --
 arch/x86/power/hibernate_asm_64.S |   55 ++-
 2 files changed, 102 insertions(+), 43 deletions(-)

Index: linux-pm/arch/x86/power/hibernate_64.c
===
--- linux-pm.orig/arch/x86/power/hibernate_64.c
+++ linux-pm/arch/x86/power/hibernate_64.c
@@ -28,6 +28,7 @@ extern asmlinkage __visible int restore_
  * kernel's text (this value is passed in the image header).
  */
 unsigned long restore_jump_address __visible;
+unsigned long jump_address_phys;
 
 /*
  * Value of the cr3 register from before the hibernation (this value is passed
@@