Re: RFR: 8330171: Lazy W^X switch implementation

2024-05-12 Thread Dean Long
On Fri, 12 Apr 2024 14:40:05 GMT, Sergey Nazarkin  wrote:

> An alternative for preemptively switching the W^X thread mode on macOS with 
> an AArch64 CPU. This implementation triggers the switch in response to the 
> SIGBUS signal if the *si_addr* belongs to the CodeCache area. With this 
> approach, it is now feasible to eliminate all WX guards and avoid potentially 
> costly operations. However, no significant improvement or degradation in 
> performance has been observed.  Additionally, considering the issue with 
> AsyncGetCallTrace, the patched JVM has been successfully operated with 
> [asgct_bottom](https://github.com/parttimenerd/asgct_bottom) and 
> [async-profiler](https://github.com/async-profiler/async-profiler). 
> 
> Additional testing:
> - [x] MacOS AArch64 server fastdebug *gtets*
> - [ ] MacOS AArch64 server fastdebug *jtreg:hotspot:tier4*
> - [ ] Benchmarking
> 
> @apangin and @parttimenerd could you please check the patch on your 
> scenarios??

I think there is a sweet-spot middle-ground between the two extremes: 
full-lazy, ideal for performance, and fine-grained execute-by-default, ideal 
for security.  I don't think we should change to full-lazy and remove all the 
guard rails at this time.  I am investigating execute-by-default, and it looks 
promising.

-

Changes requested by dlong (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/18762#pullrequestreview-2051465621


Re: RFR: 8330171: Lazy W^X switch implementation

2024-04-24 Thread Stuart Monteith
On Tue, 23 Apr 2024 15:11:10 GMT, Andrew Haley  wrote:

>> What about granting `WXWrite` only if the current thread is in 
>> `_thread_in_vm`?
>> That would be more restrictive and roughly equivalent how it currently 
>> works. Likely there are some places then that should be granted `WXWrite` 
>> eagerly because they need `WXWrite` without `_thread_in_vm`. E.g. the JIT 
>> compiler threads should have `WXWrite` and never `WXExec` (I assume) which 
>> should be checked in the signal handler.
>
>> The patch doesn't protect against native agents, as this is obviously 
>> impossible. The current code doesn't do that either. For the bytecode, it 
>> doesn't prevent the attacker from abusing unsafe api to modify code cache. 
>> However unsafe functions are already considered "safe" and we proactively 
>> enable WXWrite as well as move thread to `_thread_in_vm` state (@reinrich). 
>> JITed code can't write to the cache either with or without the patch.
>> 
>> I totally get the sense of loss of security. But is this really the case?
> 
> I think it is. W^X is intended (amongst other things) to protect against the 
> use of gadgets, from buffer overflow exploits in non-java code to ROP 
> programming. At present, in order to generate code and execute it, you first 
> have to be able to make the JIT code writable, then write the code, then make 
> it executable. then jump to the code. And the exploit writer might have to do 
> some or all of this by finding gadgets. If we were to merge this patch then 
> all the attacker would have to do is write code to memory and find a way to 
> jump to it, and the automatic switch-on-segfault in this patch would do the 
> all the work the attacker needs.
> 
> It makes far more sense to tag those places that actually need to change W^X 
> access, and only switch there.
> 
> You could argue that any switching of W^X on a write to code space, then 
> switching it back on jumping (or returning) to Java code, even what we 
> already do, is effectively the same thing. Kinda, but it's not on just any 
> attempt to write to code space or any attempt to jump into code, it's at the 
> places we choose, and we can be careful to limit those places.
> 
> But surely the JDK is not the most vulnerable part of the stack anyway? I'd 
> agree with that, of course, but I don't think that's sufficient reason to 
> decide to bypass an OS security mechanism.
> 
> We are trying to reduce the size of the attack surface.

To add a little to @theRealAph 's point, we should avoid painting ourselves 
into a corner. I don't know how the platform is going to evolve, but I'd be 
nervous about fighting against the intentions of the protections.

-

PR Comment: https://git.openjdk.org/jdk/pull/18762#issuecomment-2074244082


Re: RFR: 8330171: Lazy W^X switch implementation

2024-04-23 Thread Andrew Haley
On Fri, 19 Apr 2024 07:44:17 GMT, Richard Reingruber  wrote:

>> An alternative for preemptively switching the W^X thread mode on macOS with 
>> an AArch64 CPU. This implementation triggers the switch in response to the 
>> SIGBUS signal if the *si_addr* belongs to the CodeCache area. With this 
>> approach, it is now feasible to eliminate all WX guards and avoid 
>> potentially costly operations. However, no significant improvement or 
>> degradation in performance has been observed.  Additionally, considering the 
>> issue with AsyncGetCallTrace, the patched JVM has been successfully operated 
>> with [asgct_bottom](https://github.com/parttimenerd/asgct_bottom) and 
>> [async-profiler](https://github.com/async-profiler/async-profiler). 
>> 
>> Additional testing:
>> - [x] MacOS AArch64 server fastdebug *gtets*
>> - [ ] MacOS AArch64 server fastdebug *jtreg:hotspot:tier4*
>> - [ ] Benchmarking
>> 
>> @apangin and @parttimenerd could you please check the patch on your 
>> scenarios??
>
> What about granting `WXWrite` only if the current thread is in 
> `_thread_in_vm`?
> That would be more restrictive and roughly equivalent how it currently works. 
> Likely there are some places then that should be granted `WXWrite` eagerly 
> because they need `WXWrite` without `_thread_in_vm`. E.g. the JIT compiler 
> threads should have `WXWrite` and never `WXExec` (I assume) which should be 
> checked in the signal handler.

> The patch doesn't protect against native agents, as this is obviously 
> impossible. The current code doesn't do that either. For the bytecode, it 
> doesn't prevent the attacker from abusing unsafe api to modify code cache. 
> However unsafe functions are already considered "safe" and we proactively 
> enable WXWrite as well as move thread to `_thread_in_vm` state (@reinrich). 
> JITed code can't write to the cache either with or without the patch.
> 
> I totally get the sense of loss of security. But is this really the case?

I think it is. W^X is intended (amongst other things) to protect against the 
use of gadgets, from buffer overflow exploits in non-java code to ROP 
programming. At present, in order to generate code and execute it, you first 
have to be able to make the JIT code writable, then write the code, then make 
it executable. then jump to the code. And the exploit writer might have to do 
some or all of this by finding gadgets. If we were to merge this patch then all 
the attacker would have to do is write code to memory and find a way to jump to 
it, and the automatic switch-on-segfault in this patch would do the all the 
work the attacker needs.

It makes far more sense to tag those places that actually need to change W^X 
access, and only switch there.

You could argue that any switching of W^X on a write to code space, then 
switching it back on jumping (or returning) to Java code, even what we already 
do, is effectively the same thing. Kinda, but it's not on just any attempt to 
write to code space or any attempt to jump into code, it's at the places we 
choose, and we can be careful to limit those places.

But surely the JDK is not the most vulnerable part of the stack anyway? I'd 
agree with that, of course, but I don't think that's sufficient reason to 
decide to bypass an OS security mechanism.

We are trying to reduce the size of the attack surface.

-

PR Comment: https://git.openjdk.org/jdk/pull/18762#issuecomment-2072639243


Re: RFR: 8330171: Lazy W^X switch implementation

2024-04-23 Thread Sergey Nazarkin
On Fri, 19 Apr 2024 07:44:17 GMT, Richard Reingruber  wrote:

>> An alternative for preemptively switching the W^X thread mode on macOS with 
>> an AArch64 CPU. This implementation triggers the switch in response to the 
>> SIGBUS signal if the *si_addr* belongs to the CodeCache area. With this 
>> approach, it is now feasible to eliminate all WX guards and avoid 
>> potentially costly operations. However, no significant improvement or 
>> degradation in performance has been observed.  Additionally, considering the 
>> issue with AsyncGetCallTrace, the patched JVM has been successfully operated 
>> with [asgct_bottom](https://github.com/parttimenerd/asgct_bottom) and 
>> [async-profiler](https://github.com/async-profiler/async-profiler). 
>> 
>> Additional testing:
>> - [x] MacOS AArch64 server fastdebug *gtets*
>> - [ ] MacOS AArch64 server fastdebug *jtreg:hotspot:tier4*
>> - [ ] Benchmarking
>> 
>> @apangin and @parttimenerd could you please check the patch on your 
>> scenarios??
>
> What about granting `WXWrite` only if the current thread is in 
> `_thread_in_vm`?
> That would be more restrictive and roughly equivalent how it currently works. 
> Likely there are some places then that should be granted `WXWrite` eagerly 
> because they need `WXWrite` without `_thread_in_vm`. E.g. the JIT compiler 
> threads should have `WXWrite` and never `WXExec` (I assume) which should be 
> checked in the signal handler.

The patch doesn't protect against native agents, as this is obviously 
impossible. The current code doesn't do that either.  For the bytecode, it 
doesn't prevent the attacker from abusing unsafe api to modify code cache. 
However unsafe functions are already considered "safe" and we proactively 
enable WXWrite as well as move thread to `_thread_in_vm` state (@reinrich). 
JITed code can't write to the cache either with or without the patch.  

I totally get the sense of loss of security. But is this really the case?

-

PR Comment: https://git.openjdk.org/jdk/pull/18762#issuecomment-2071988941


Re: RFR: 8330171: Lazy W^X switch implementation

2024-04-19 Thread Richard Reingruber
On Fri, 12 Apr 2024 14:40:05 GMT, Sergey Nazarkin  wrote:

> An alternative for preemptively switching the W^X thread mode on macOS with 
> an AArch64 CPU. This implementation triggers the switch in response to the 
> SIGBUS signal if the *si_addr* belongs to the CodeCache area. With this 
> approach, it is now feasible to eliminate all WX guards and avoid potentially 
> costly operations. However, no significant improvement or degradation in 
> performance has been observed.  Additionally, considering the issue with 
> AsyncGetCallTrace, the patched JVM has been successfully operated with 
> [asgct_bottom](https://github.com/parttimenerd/asgct_bottom) and 
> [async-profiler](https://github.com/async-profiler/async-profiler). 
> 
> Additional testing:
> - [x] MacOS AArch64 server fastdebug *gtets*
> - [ ] MacOS AArch64 server fastdebug *jtreg:hotspot:tier4*
> - [ ] Benchmarking
> 
> @apangin and @parttimenerd could you please check the patch on your 
> scenarios??

What about granting `WXWrite` only if the current thread is in `_thread_in_vm`?
That would be more restrictive and roughly equivalent how it currently works. 
Likely there are some places then that should be granted `WXWrite` eagerly 
because they need `WXWrite` without `_thread_in_vm`. E.g. the JIT compiler 
threads should have `WXWrite` and never `WXExec` (I assume) which should be 
checked in the signal handler.

-

PR Comment: https://git.openjdk.org/jdk/pull/18762#issuecomment-2065983074


Re: RFR: 8330171: Lazy W^X switch implementation

2024-04-15 Thread Sergey Nazarkin
On Mon, 15 Apr 2024 08:33:25 GMT, Bernhard Urban-Forster  
wrote:

> do you have numbers on how many transitions are done with your PR vs. the 
> current state when running the same program?

With just simple **java -version** it is ~180 vs ~9500 (new vs old),  for 
**java -help** ~1120 vs ~86300. For the applications the ration is about the 
same.

-

PR Comment: https://git.openjdk.org/jdk/pull/18762#issuecomment-2057280998


Re: RFR: 8330171: Lazy W^X switch implementation

2024-04-15 Thread Bernhard Urban-Forster
On Fri, 12 Apr 2024 14:40:05 GMT, Sergey Nazarkin  wrote:

> An alternative for preemptively switching the W^X thread mode on macOS with 
> an AArch64 CPU. This implementation triggers the switch in response to the 
> SIGBUS signal if the *si_addr* belongs to the CodeCache area. With this 
> approach, it is now feasible to eliminate all WX guards and avoid potentially 
> costly operations. However, no significant improvement or degradation in 
> performance has been observed.  Additionally, considering the issue with 
> AsyncGetCallTrace, the patched JVM has been successfully operated with 
> [asgct_bottom](https://github.com/parttimenerd/asgct_bottom) and 
> [async-profiler](https://github.com/async-profiler/async-profiler). 
> 
> Additional testing:
> - [x] MacOS AArch64 server fastdebug *gtets*
> - [ ] MacOS AArch64 server fastdebug *jtreg:hotspot:tier4*
> - [ ] Benchmarking
> 
> @apangin and @parttimenerd could you please check the patch on your 
> scenarios??

I agree that this PR effectively removes the whole idea behind JIT_MAP and thus 
is a bad idea security wise.  Still it has some value.

@snazarkin do you have numbers on how many transitions are done with your PR 
vs. the current state when running the same program?  That would give us a 
lower bound on the amount of transitions needed and define a goal for future 
improvements in that area.

-

PR Comment: https://git.openjdk.org/jdk/pull/18762#issuecomment-2056182560


Re: RFR: 8330171: Lazy W^X switch implementation

2024-04-13 Thread Andrew Haley
On Sat, 13 Apr 2024 18:16:21 GMT, Thomas Stuefe  wrote:

> I have one question, and I'm sorry if it has been answered before. How 
> expensive is changing the mode? Is it just a status variable in user-space 
> pthread lib? Or does it need a system call?
> 
> In other words, how fine granular can we get without incurring too high a 
> cost?

It's expensive. We've seen it cause significant slowdowns in Java->VM 
transitions.

-

PR Comment: https://git.openjdk.org/jdk/pull/18762#issuecomment-2053734174


Re: RFR: 8330171: Lazy W^X switch implementation

2024-04-13 Thread Thomas Stuefe
On Fri, 12 Apr 2024 14:40:05 GMT, Sergey Nazarkin  wrote:

> An alternative for preemptively switching the W^X thread mode on macOS with 
> an AArch64 CPU. This implementation triggers the switch in response to the 
> SIGBUS signal if the *si_addr* belongs to the CodeCache area. With this 
> approach, it is now feasible to eliminate all WX guards and avoid potentially 
> costly operations. However, no significant improvement or degradation in 
> performance has been observed.  Additionally, considering the issue with 
> AsyncGetCallTrace, the patched JVM has been successfully operated with 
> [asgct_bottom](https://github.com/parttimenerd/asgct_bottom) and 
> [async-profiler](https://github.com/async-profiler/async-profiler). 
> 
> Additional testing:
> - [x] MacOS AArch64 server fastdebug *gtets*
> - [ ] MacOS AArch64 server fastdebug *jtreg:hotspot:tier4*
> - [ ] Benchmarking
> 
> @apangin and @parttimenerd could you please check the patch on your 
> scenarios??

I have one question, and I'm sorry if it has been answered before. How 
expensive is changing the mode? Is it just a status variable in user-space 
pthread lib? Or does it need a system call? 

In other words, how fine granular can we get without incurring too high a cost?

-

PR Comment: https://git.openjdk.org/jdk/pull/18762#issuecomment-2053721713


Re: RFR: 8330171: Lazy W^X switch implementation

2024-04-12 Thread Sergey Nazarkin
On Fri, 12 Apr 2024 15:26:03 GMT, Andrew Haley  wrote:

>> Hello Sergey.
>> W^X mode was initially forced by Apple to prevent writing to executable 
>> memory, as a security feature.
>> This change just eliminates this security feature at all, doesn't it ?
>> Basically: "want to write to Executable memory ? ok, here you go"
>
>> Hello Sergey. W^X mode was initially forced by Apple to prevent writing to 
>> executable memory, as a security feature. This change just eliminates this 
>> security feature at all, doesn't it ? Basically: "want to write to 
>> Executable memory ? ok, here you go"
> 
> Yes @VladimirKempik, you are right. No, we should not do this.
> 
> Instead, when we enter the VM we could track the current state of W^X and 
> whenever we enter a block that needs to write into code space we would set W 
> if needed. When we leave the VM or when we call back into Java we would set 
> X, if needed. The cost of doing this would be small, but we'd have to find 
> all the blocks that need to write into code space. This might be more effort 
> than we want to make, though.
> 
> So where would be need to make the transitions to W? At a guess, whenever we 
> start assembling something, and in all of the methods in 
> nativeInst_aarch64.?pp, and in class Patcher. And to X, in the call stub and 
> a few other places.
> 
> That would minimize the transitions exactly to the set of places we actually 
> need.

Thanks @theRealAph, @VladimirKempik
> Instead, when we enter the VM we could track the current state of W^X and 
> whenever we enter a block that needs to write into code space we would set W 
> if needed. When we leave the VM or when we call back into Java we would set 
> X, if needed. The cost of doing this would be small, but we'd have to find 
> all the blocks that need to write into code space. This might be more effort 
> than we want to make, though.

It is the way in which it is implemented in the current code.  Unfortunately, 
it is hardly maintainable solution that suffers from issues like [1-5].   

As I understand it, your concern is that the implementation doesn't prevent 
rogue from writing to the code cache with some some unsafe api? 

1. https://bugs.openjdk.org/browse/JDK-8302736
2. https://bugs.openjdk.org/browse/JDK-8327990
3. https://bugs.openjdk.org/browse/JDK-8327036
4. https://bugs.openjdk.org/browse/JDK-8304725
5. https://bugs.openjdk.org/browse/JDK-8307549

-

PR Comment: https://git.openjdk.org/jdk/pull/18762#issuecomment-2052164890


Re: RFR: 8330171: Lazy W^X switch implementation

2024-04-12 Thread Vladimir Kempik
On Fri, 12 Apr 2024 14:40:05 GMT, Sergey Nazarkin  wrote:

> An alternative for preemptively switching the W^X thread mode on macOS with 
> an AArch64 CPU. This implementation triggers the switch in response to the 
> SIGBUS signal if the *si_addr* belongs to the CodeCache area. With this 
> approach, it is now feasible to eliminate all WX guards and avoid potentially 
> costly operations. However, no significant improvement or degradation in 
> performance has been observed.  Additionally, considering the issue with 
> AsyncGetCallTrace, the patched JVM has been successfully operated with 
> [asgct_bottom](https://github.com/parttimenerd/asgct_bottom) and 
> [async-profiler](https://github.com/async-profiler/async-profiler). 
> 
> Additional testing:
> - [x] MacOS AArch64 server fastdebug *gtets*
> - [ ] MacOS AArch64 server fastdebug *jtreg:hotspot:tier4*
> - [ ] Benchmarking
> 
> @apangin and @parttimenerd could you please check the patch on your 
> scenarios??

This can be left as an addition to existing mechanism. Disabled by default and 
can be enabled with a special (DEVELOP) option. So this can't be enabled on 
production builds, but can be useful to debug w^x issues on debug builds

-

PR Comment: https://git.openjdk.org/jdk/pull/18762#issuecomment-2051984322


Re: RFR: 8330171: Lazy W^X switch implementation

2024-04-12 Thread Andrew Haley
On Fri, 12 Apr 2024 14:50:46 GMT, Vladimir Kempik  wrote:

> Hello Sergey. W^X mode was initially forced by Apple to prevent writing to 
> executable memory, as a security feature. This change just eliminates this 
> security feature at all, doesn't it ? Basically: "want to write to Executable 
> memory ? ok, here you go"

Yes @VladimirKempik, you are right. No, we should not do this.

Instead, when we enter the VM we could track the current state of W^X and 
whenever we enter a block that needs to write into code space we would set W if 
needed. When we leave the VM or when we call back into Java we would set X, if 
needed. The cost of doing this would be small, but we'd have to find all the 
blocks that need to write into code space. This might be more effort than we 
want to make, though.

So where would be need to make the transitions to W? At a guess, whenever we 
start assembling something, and in all of the methods in 
nativeInst_aarch64.?pp, and in class Patcher. And to X, in the call stub and a 
few other places.

That would minimize the transitions exactly to the set of places we actually 
need.

-

PR Comment: https://git.openjdk.org/jdk/pull/18762#issuecomment-2051977752