[GitHub] [arrow] westonpace commented on issue #10138: feather read a part of columns slower than read the entire file

GitBox Wed, 28 Apr 2021 11:56:12 -0700


westonpace commented on issue #10138:
URL: https://github.com/apache/arrow/issues/10138#issuecomment-828701272



   Actually, you do see some benefit in the cold-cache case if you are using 
memory-mapped I/O (the default).  Arrow is still mapping the entire file but it 
only accesses the parts it needs:
   
   ```
   [ read_1.0 ] speed (GB/s): mean = 0.10454136550934692, std = 
0.007475733527851558, n = 10; total time_elapse: 9.674335052999595
   [ read_0.8 ] speed (GB/s): mean = 0.08324025202688382, std = 
0.007408401336626204, n = 10; total time_elapse: 9.73901134300013
   [ read_0.6 ] speed (GB/s): mean = 0.06967461380067123, std = 
0.005036855302449362, n = 10; total time_elapse: 8.70547049200104
   [ read_0.4 ] speed (GB/s): mean = 0.06235823834380603, std = 
0.006652843286404212, n = 10; total time_elapse: 6.526506393001
   [ read_0.2 ] speed (GB/s): mean = 0.05067126099456597, std = 
0.004539982065432586, n = 10; total time_elapse: 4.0006215649991645
   ```
   
   My guess is that it is not linear because you take an HDD penalty moving 
from sequential scan to random access.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] westonpace commented on issue #10138: feather read a part of columns slower than read the entire file

Reply via email to