westonpace commented on issue #10138: URL: https://github.com/apache/arrow/issues/10138#issuecomment-828701272
Actually, you do see some benefit in the cold-cache case if you are using memory-mapped I/O (the default). Arrow is still mapping the entire file but it only accesses the parts it needs: ``` [ read_1.0 ] speed (GB/s): mean = 0.10454136550934692, std = 0.007475733527851558, n = 10; total time_elapse: 9.674335052999595 [ read_0.8 ] speed (GB/s): mean = 0.08324025202688382, std = 0.007408401336626204, n = 10; total time_elapse: 9.73901134300013 [ read_0.6 ] speed (GB/s): mean = 0.06967461380067123, std = 0.005036855302449362, n = 10; total time_elapse: 8.70547049200104 [ read_0.4 ] speed (GB/s): mean = 0.06235823834380603, std = 0.006652843286404212, n = 10; total time_elapse: 6.526506393001 [ read_0.2 ] speed (GB/s): mean = 0.05067126099456597, std = 0.004539982065432586, n = 10; total time_elapse: 4.0006215649991645 ``` My guess is that it is not linear because you take an HDD penalty moving from sequential scan to random access. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
