pussuw commented on code in PR #9103:
URL: https://github.com/apache/nuttx/pull/9103#discussion_r1214096917


##########
arch/risc-v/src/common/riscv_macros.S:
##########
@@ -227,8 +222,15 @@
   REGLOAD      t0, REG_INT_CTX(\out)
   li           t1, MSTATUS_FS
   and          t2, t0, t1
-  li           t1, MSTATUS_FS_INIT
-  ble          t2, t1, 1f
+  li           t1, MSTATUS_FS_DIRTY
+  bne          t2, t1, 1f
+
+  /* Reset FS bit to MSTATUS_FS_CLEAN */
+  li           t1, MSTATUS_FS_CLEAN

Review Comment:
   @xiaoxiang781216 
   
   Finally got time to finalize the lazyFPU implementation and testing. With 
only nsh + a simple test app, I made a very simple code snippet with system 
calls enabled (to force trap entry):
   
   ```
   int main(...)
   
   ... use floating point arithmetics ...
   
   for (i = 0; i < n_samples; i++)
   {
     t0 = read_cycles();
     pid = getpid();
     t1 = read_cycles();
     diff = t1 - t0;
     samples <- diff;
   }
   
   ... print out avg, min, max
   ```
   
   > Ok, let's wait your result.
   
   With lazy FPU the system call takes:
   avg:738, min:738, max:886
   
   without lazy FPU + fix error in this PR  (save FPU if state not in {OFF, 
INIT} restore if state in {DIRTY, CLEAN}):
   avg:954, min:954, max:1019
   
   So on average it saves more than 200 cycles with this simple system call, 
~22% (of course the benefit is less with more complex system calls).
   
   Cumulative effect on my application that uses system calls + FPU is ~2.7%, 
so not insignificant at all.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to