g2gps opened a new pull request, #10478:
URL: https://github.com/apache/nuttx/pull/10478

   ## Summary
   
   Adds an optional driver which can be used to encapsulate and existing block 
device and provide caching using a runtime registered configuration.
   
   A few things to note:
    - The driver includes no automatic flushing, like the generic `rwbuffer`. 
Dirty blocks are only flushed on close, when the reference count reaches zero. 
or in response to a `BIOC_FLUSH` `IOCTL` command.
    - Given the above, it's probably not suitable for removable media.
    - I've outlined the caching strategy in `block_cache.h`.  It's not  the 
latest and greatest  strategy, but it provides a significant performance 
improvement. We're going to be using this internally, so improvement and fixes 
will come over time.
    - I'm not entirely sure if I've placed the driver in the correct location, 
or if the driver registration method is the most appropriate, but it works.
   
   In general this directly addresses the issues in #9080, but given the 
enhancement, it would be nice to get this integrated so it's available to the 
wider community. 
   
   ## Impact
   
    Greatly improves the performance of block devices, assuming sufficient RAM 
is available.
   
   A summary from #9080, of the speeds I was seeing without caching:
   
   ```bash
   # Write speeds
   nsh> dd if=/dev/zero of=/dev/sd0 bs=64 count=1000
   64000 bytes copied, 647623 usec, 96 KB/s
   nsh> dd if=/dev/zero of=/dev/sd0 bs=512 count=100
   51200 bytes copied, 295710 usec, 169 KB/s
   nsh> dd if=/dev/zero of=/dev/sd0 bs=4096 count=100
   409600 bytes copied, 935538 usec, 427 KB/s
   nsh> dd if=/dev/zero of=/dev/sd0 bs=16384 count=100
   1638400 bytes copied, 1085459 usec, 1474 KB/s
   nsh> dd if=/dev/zero of=/dev/sd0 bs=65536 count=100
   6553600 bytes copied, 1515564 usec, 4222 KB/s
   
   # Read speeds:
   nsh> dd if=/dev/sd0 of=/dev/null bs=64 count=100
   6400 bytes copied, 34671 usec, 180 KB/s
   nsh> dd if=/dev/sd0 of=/dev/null bs=512 count=100
   51200 bytes copied, 162573 usec, 307 KB/s
   nsh> dd if=/dev/sd0 of=/dev/null bs=4096 count=100
   409600 bytes copied, 290912 usec, 1374 KB/s
   nsh> dd if=/dev/sd0 of=/dev/null bs=16384 count=100
   1638400 bytes copied, 292429 usec, 5471 KB/s
   nsh> dd if=/dev/sd0 of=/dev/null bs=65536 count=100
   6553600 bytes copied, 672012 usec, 9523 KB/s
   ```
   
   With caching:
   
   ```Bash
   # Write speeds
   nsh> dd if=/dev/zero of=/mnt/test bs=64 count=1000
   64000 bytes copied, 411442 usec, 151 KB/s
   nsh> dd if=/dev/zero of=/mnt/test bs=512 count=100
   51200 bytes copied, 45271 usec, 1104 KB/s
   nsh> dd if=/dev/zero of=/mnt/test bs=4096 count=100
   409600 bytes copied, 63529 usec, 6296 KB/s
   dd if=/dev/zero of=/mnt/test bs=16384 count=100
   1638400 bytes copied, 122956 usec, 13012 KB/s
   nsh> dd if=/dev/zero of=/mnt/test bs=65536 count=100
   6553600 bytes copied, 2976539 usec, 2150 KB/s
   
   # Read speeds
   nsh> dd if=/mnt/test of=/dev/null bs=64 count=100
   6400 bytes copied, 41172 usec, 151 KB/s
   nsh> dd if=/mnt/test of=/dev/null bs=512 count=100
   51200 bytes copied, 45931 usec, 1088 KB/s
   nsh> dd if=/mnt/test of=/dev/null bs=4096 count=100
   409600 bytes copied, 62407 usec, 6409 KB/s
   nsh> dd if=/mnt/test of=/dev/null bs=16384 count=100
   1638400 bytes copied, 104206 usec, 15354 KB/s
   nsh> dd if=/mnt/test of=/dev/null bs=65536 count=100
   6553600 bytes copied, 1331805 usec, 4805 KB/s
   
   # Copy speeds
   nsh> dd if=test of=test2 bs=512 count=100
   51200 bytes copied, 62416 usec, 801 KB/s
   nsh> dd if=test of=test2 bs=4096 count=100
   409600 bytes copied, 97481 usec, 4103 KB/s
   nsh> dd if=test of=test2 bs=16384 count=100
   1638400 bytes copied, 1174449 usec, 1362 KB/s
   nsh> dd if=test of=test2 bs=65536 count=100
   6553600 bytes copied, 4261865 usec, 1501 KB/s
   ```
   
   Notes:
    - The previous tests (from #9080) were using a block to character to 
device, and no filesystem. 
    - The current tests use a FAT filesystem.
    - The tests with the largest 65K (16K for copy) block sizes exhaust the 2M 
cache size I'm allocated, so the driver is evicting dirty cache windows to the 
underlying EMMC, thus performance is slower.
   
   For my cached tests, I was using 32x64KB cache windows, and exposing the 
driver in 4KB blocks. I.e, it was initialized as follows:
   
   ```c
     ret = block_cache_initialize(
         "/dev/mmcsd0",
         "/dev/bmmcsd0",
         128,                   // 64KB cache width
         32,                    // 32 cache sections
         8                      // Expose the EMMC device in 4K blocks
   ```
   
   This may not be the 'golden' configuration, it's just a starting point.
   
   ## Testing
   
   I have local version of `sdbench` and `sdstress` from 
[PX4-autopilot](https://github.com/PX4/PX4-Autopilot), along with our custom 
applications which I've been using for testing and verification. So far, 
everything seems to work fine, but there could be unforeseen issues here, thus 
the current 'draft' status.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to