lidavidm commented on a change in pull request #10693:
URL: https://github.com/apache/arrow/pull/10693#discussion_r669830370
##########
File path: docs/source/python/dataset.rst
##########
@@ -456,20 +456,163 @@ is materialized as columns when reading the data and can
be used for filtering:
dataset.to_table().to_pandas()
dataset.to_table(filter=ds.field('year') == 2019).to_pandas()
+Another benefit of manually scheduling the files is that the order of the files
+controls the order of the data. When performing an ordered read (or a read to
+a table) then the rows returned will match the order of the files given. This
+only applies when the dataset is constructed with a list of files. There
+are no order guarantees given when the files are instead discovered by scanning
Review comment:
Ah thanks. Sorry for the churn here @westonpace - we should keep stating
that there's no guarantee then.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]