lidavidm commented on a change in pull request #12560:
URL: https://github.com/apache/arrow/pull/12560#discussion_r836439499
##########
File path: python/pyarrow/_dataset.pyx
##########
@@ -240,14 +240,22 @@ cdef class Dataset(_Weakrefable):
columns : list of str, default None
The columns to project. This can be a list of column names to
include (order and duplicates will be preserved), or a dictionary
- with {new_column_name: expression} values for more advanced
+ with {{new_column_name: expression}} values for more advanced
Review comment:
did you mean to double-up the brackets here?
##########
File path: python/pyarrow/_dataset.pyx
##########
@@ -240,14 +240,22 @@ cdef class Dataset(_Weakrefable):
columns : list of str, default None
The columns to project. This can be a list of column names to
include (order and duplicates will be preserved), or a dictionary
- with {new_column_name: expression} values for more advanced
+ with {{new_column_name: expression}} values for more advanced
projections.
+
+ The list of columns or expressions may use the special fields
+ like `__batch_index` (the index of the batch within the
fragment),
+ `__fragment_index` (the index of the fragment within the dataset),
+ `__last_in_fragment` (whether the batch is last in fragment) and
Review comment:
```suggestion
`__last_in_fragment` (whether the batch is last in fragment), and
```
##########
File path: python/pyarrow/_dataset.pyx
##########
@@ -240,14 +240,22 @@ cdef class Dataset(_Weakrefable):
columns : list of str, default None
The columns to project. This can be a list of column names to
include (order and duplicates will be preserved), or a dictionary
- with {new_column_name: expression} values for more advanced
+ with {{new_column_name: expression}} values for more advanced
projections.
+
+ The list of columns or expressions may use the special fields
+ like `__batch_index` (the index of the batch within the
fragment),
+ `__fragment_index` (the index of the fragment within the dataset),
+ `__last_in_fragment` (whether the batch is last in fragment) and
+ `__filename` (the name of the source file or a description of the
+ source fragment).
+
The columns will be passed down to Datasets and corresponding data
fragments to avoid loading, copying, and deserializing columns
that will not be required further down the compute chain.
- By default all of the available columns are projected. Raises
- an exception if any of the referenced column names does not exist
- in the dataset's Schema.
+ By default all of the available columns are projected.
+ An error will be returned if any of the referenced column name
+ does not exist in the dataset's schema.
Review comment:
C++ may return an error, but Python will raise an exception, right?
##########
File path: python/pyarrow/_dataset.pyx
##########
@@ -240,14 +240,22 @@ cdef class Dataset(_Weakrefable):
columns : list of str, default None
The columns to project. This can be a list of column names to
include (order and duplicates will be preserved), or a dictionary
- with {new_column_name: expression} values for more advanced
+ with {{new_column_name: expression}} values for more advanced
projections.
+
+ The list of columns or expressions may use the special fields
+ like `__batch_index` (the index of the batch within the
fragment),
Review comment:
```suggestion
`__batch_index` (the index of the batch within the fragment),
```
##########
File path: python/pyarrow/_dataset.pyx
##########
@@ -2019,9 +2023,12 @@ cdef class Scanner(_Weakrefable):
dataset : Dataset
Dataset to scan.
columns : list of str or dict, default None
- The columns to project. This can be a list of column names to include
- (order and duplicates will be preserved), or a dictionary with
- {new_column_name: expression} values for more advanced projections.
+ The columns to project. This can be a list of column names to
+ include (order and duplicates will be preserved) which may contain the
+ augmented fields such as `batch_index`, `fragment_index`,
+ `last_in_fragment` and `filename`, or a dictionary
+ with {new_column_name: expression} values for more advanced
+ projections.
Review comment:
Interesting, it appears this only works for Python classes and not
Cython classes. Thanks for checking.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]