whyzdev commented on issue #31174:
URL: https://github.com/apache/arrow/issues/31174#issuecomment-1457302230

   looks like this is still an issue as of 11.0.0, but may be closed 
   #16972 is still open, where filtered FileSystemDataset and caching were 
suggested/mentioned in the comments.
   Caching may already be done in Python user code, for example via monkey 
patching pyarrow dataset._filesystem_dataset. But this is at full dataset 
level, and difficult if not impossible to updated incrementally in Python, when 
one or a few partitions change frequently to avoid full eviction. The 
FileSystemDataset and underlying objects are in C++ not Python. So we may need 
some native support for caching by Arrow API.
   
   Btw #9670 since 4.0.0 seemed to be a separate enhancement for reading table 
but not for speeding up the loading of FileSystemDataset.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to