westonpace commented on issue #37630:
URL: https://github.com/apache/arrow/issues/37630#issuecomment-1737875643

   > I believe [I might be be experiencing this same problem through the Python 
API](https://github.com/apache/arrow/issues/37820). Having to maintain a local 
build of Arrow doesn't sound like the right solution, so I wonder if there are 
ideas of how to achieve the same result as @icexelloss 's last comment, but 
through Arrow's normal APIs?
   
   We should definitely make metadata caching an optional feature of the 
scanner and/or dataset.  I think the API could be as simple as...
   
   ```
   my dataset = pyarrow.dataset.dataset(..., cache_metadata=False)
   ```
   
   Any place that is using a dataset "temporarily" should also set this to 
false (e.g. when we run pyarrow.parquet.read_table it creates a dataset and 
scans it.  That temporary dataset should NOT cache metadata)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to