westonpace commented on a change in pull request #12560:
URL: https://github.com/apache/arrow/pull/12560#discussion_r831713878



##########
File path: python/pyarrow/_dataset.pyx
##########
@@ -239,15 +239,17 @@ cdef class Dataset(_Weakrefable):
         ----------
         columns : list of str, default None
             The columns to project. This can be a list of column names to
-            include (order and duplicates will be preserved), or a dictionary
+            include (order and duplicates will be preserved) which may contain 
the
+            augmented fields such as `batch_index`, `fragment_index`, 
+            `last_in_fragment` and `filename`, or a dictionary
             with {new_column_name: expression} values for more advanced
             projections.
             The columns will be passed down to Datasets and corresponding data
             fragments to avoid loading, copying, and deserializing columns
             that will not be required further down the compute chain.
-            By default all of the available columns are projected. Raises
-            an exception if any of the referenced column names does not exist
-            in the dataset's Schema.
+            By default all of the available columns are projected.
+            Raises an exception if any of the referenced column names 
+            does not exist in the dataset's Schema.

Review comment:
       ```suggestion
               By default all of the available columns are projected.
               An error will be returned if any referenced column name
               does not exist in the dataset's schema.
   ```
   We don't actually raise an exception.

##########
File path: cpp/src/arrow/dataset/scanner_test.cc
##########
@@ -128,6 +128,15 @@ class TestScanner : public 
DatasetFixtureMixinWithParam<TestScannerParams> {
     AssertScanBatchesEquals(expected.get(), scanner.get());
   }
 
+  void AssertScanForAugmentedFields(std::shared_ptr<Scanner> scanner) {

Review comment:
       ```suggestion
     void AssertNoAugmentedFields(std::shared_ptr<Scanner> scanner) {
   ```
   

##########
File path: python/pyarrow/_dataset.pyx
##########
@@ -239,15 +239,17 @@ cdef class Dataset(_Weakrefable):
         ----------
         columns : list of str, default None
             The columns to project. This can be a list of column names to
-            include (order and duplicates will be preserved), or a dictionary
+            include (order and duplicates will be preserved) which may contain 
the
+            augmented fields such as `batch_index`, `fragment_index`, 
+            `last_in_fragment` and `filename`, or a dictionary

Review comment:
       Agreed.  It would be good to change the wording here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to