westonpace commented on issue #10138:
URL: https://github.com/apache/arrow/issues/10138#issuecomment-828701272


   Actually, you do see some benefit in the cold-cache case if you are using 
memory-mapped I/O (the default).  Arrow is still mapping the entire file but it 
only accesses the parts it needs:
   
   ```
   [ read_1.0 ] speed (GB/s): mean = 0.10454136550934692, std = 
0.007475733527851558, n = 10; total time_elapse: 9.674335052999595
   [ read_0.8 ] speed (GB/s): mean = 0.08324025202688382, std = 
0.007408401336626204, n = 10; total time_elapse: 9.73901134300013
   [ read_0.6 ] speed (GB/s): mean = 0.06967461380067123, std = 
0.005036855302449362, n = 10; total time_elapse: 8.70547049200104
   [ read_0.4 ] speed (GB/s): mean = 0.06235823834380603, std = 
0.006652843286404212, n = 10; total time_elapse: 6.526506393001
   [ read_0.2 ] speed (GB/s): mean = 0.05067126099456597, std = 
0.004539982065432586, n = 10; total time_elapse: 4.0006215649991645
   ```
   
   My guess is that it is not linear because you take an HDD penalty moving 
from sequential scan to random access.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to