hujun260 commented on PR #13486:
URL: https://github.com/apache/nuttx/pull/13486#issuecomment-2357375331

   > > The benefits are very significant. Before the modification, if we needed 
to obtain the interrupt status, it required three steps:
   > > 1 Obtain the CPU index 2 Access the global variable 4 Disable/Enable 
interrupts This process involved at least 6 CPU instructions.
   > > However, now it only requires a single CPU instruction.
   > 
   > 1. The switch irq enable/disable in `up_interrupt_context()` could be 
removed actually, as 32-bit is atomic type on arm32 CPU core
   > 2. The instructions cycle timings of MCR may bring more overhead, 
requiring **6 cycles in the worst case**
   > 
   > 
![image](https://private-user-images.githubusercontent.com/758493/368395985-be5f4fbe-6de4-42c7-9fae-84477486d719.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjY2MjU4MjgsIm5iZiI6MTcyNjYyNTUyOCwicGF0aCI6Ii83NTg0OTMvMzY4Mzk1OTg1LWJlNWY0ZmJlLTZkZTQtNDJjNy05ZmFlLTg0NDc3NDg2ZDcxOS5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwOTE4JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDkxOFQwMjEyMDhaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1iOWU0ZTUzY2Y5YjE0MmZjYjVlNzgwMWFhZmIxODMwZDA2ZTRhNGJhYWI0MzdkZTJmMzVkYTU3YWZlYTNlNGExJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.Xqt8-TpPs8v1YMtZgBLer9pOZGrKiny96Pe5_8RG-cE)
   > 
   > https://developer.arm.com/documentation/100026/0104/smr1465219161191
   > 
   > Do we have relevant performance test? For example, how many cycles does it 
take to call up_set_current_regs()/up_current_regs() 10,000 times with/out this 
PR?
   
   Firstly, irq masking cannot be removed here due to the crucial reason that 
we must ensure no scheduling occurs for the current task
   after the cpuindex is acquired. Otherwise, the cpuindex will not correspond 
to the CPU where the current task resides, leading to logical errors.
   The implementation of this_task follows a similar principle.
   
   The current implementation need at least 3 executions of msr/mrs 
instructions plus 4 normal instructions, making this optimization evident. 
After optimization, only a single msr instruction is needed, with no additional 
overhead.
   
   Unfortunately, we haven't conducted tests specifically for this single 
optimization point alone.
   Instead, we've tested the entire message sending/receiving process, and each 
test incorporates multiple optimization points.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to