JiaRu2016 opened a new issue #10138: URL: https://github.com/apache/arrow/issues/10138
I'v found that read several columns of a feather DataFrame is slower than read the entire file. The pattern is - read half of the columns consumes roughly same time as read entire file - read more than 50% of columns cost more time than reading entire file - only reading less than 50% of columns DO saves time Is it a bug or expected behavior ? Timming code is like this, I could write reproducing script if need. ```python t0 = time.perf_counter() df = feather.read_dataframe(path, columns=cols) # read some columns t1 = time.perf_counter() elapsed = t1 - t0 t0 = time.perf_counter() df = feather.read_dataframe(path) # read entire file t1 = time.perf_counter() elapsed = t1 - t0 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
