[GitHub] [arrow] JiaRu2016 opened a new issue #10138: feather read a part of columns slower than read the entire file

GitBox Thu, 22 Apr 2021 22:16:44 -0700


JiaRu2016 opened a new issue #10138:
URL: https://github.com/apache/arrow/issues/10138



   I'v found that read several columns of a feather DataFrame is slower than 
read the entire file. The pattern is 
   - read half of the columns consumes roughly same time as read entire file
   - read more than 50% of columns cost more time than reading entire file
   - only reading less than 50% of columns DO saves time
   
   Is it a bug or expected behavior ?
   
   Timming code is like this, I could write reproducing script if need.
   
   ```python
   t0 = time.perf_counter()
   df = feather.read_dataframe(path, columns=cols)  # read some columns
   t1 = time.perf_counter()
   elapsed = t1 - t0
   
   t0 = time.perf_counter()
   df = feather.read_dataframe(path)    # read entire file
   t1 = time.perf_counter()
   elapsed = t1 - t0
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] JiaRu2016 opened a new issue #10138: feather read a part of columns slower than read the entire file

Reply via email to