westonpace commented on issue #35894:
URL: https://github.com/apache/arrow/issues/35894#issuecomment-1599178615
> it seems that was an issue within the Dataset API
Yes, however, pretty much all reads are using the dataset API internally
now. The paths have been merged for simplicity of maintenance. `read_table`
will create a dataset with one file and then read it. The first time the file
is opened happens when the dataset is created (to get the schema of the file).
I am pretty sure it isn't actually reading the schema twice though. I think it
is something like...
* Create dataset
* open file
* read metadata
* close file
* Read dataset
* open file
* read data
* close file
I agree that it is somewhat less than ideal that the file is opened twice.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]