> if (!((long)ip - (long)start_ip <= (long)post_commit_offset))
>   return 1;

> This introduces an issue here: if "ip" is lower than "start_ip", we
> can incorrectly think we are in a critical section, when we are in
> fact not.

This shouldn't be an issue if we used unsigned numbers. Eg if start_ip is X and 
post_commit_offset is L, then (ip - X <= L) means that if ip is less than X ip 
- X will be signed, which will become a large unsigned value. 

> or to the kernel to set it back to NULL if it finds out that it is
> preempting/delivering a signal over an instruction pointer outside
> of the current rseq_cs start_ip/post_commit_ip range (lazy clear).

I see, lazy clear makes sense. Still, if during most execution periods the user 
code enters some rseq section (likely if rseq is used for something like 
malloc) on every context switch this code will have to be run.

> Moreover, this modification would add a subtraction on the common case
> (ip - start_ip), and makes the ABI slightly uglier.

We could benchmark it but the subtraction should be similar in cost to the 
extra comparison but reducing the number of branches seems like it will help as 
well. FWIW GCC attempts to translate this kind of sequence to a subtract and 
compare: https://godbolt.org/g/5DGLvo.

I agree the ABI is uglier, but since we're mucking with every context switch I 
thought I'd point it out.

> If I understand well, you are proposing to speed up .so load time by
> means of removing relocations of pointers within rseq_cs, done by
> making those relative to the rseq_cs address.

Yeah, I think this may be overkill as optimization.

Reply via email to