davlee1972 commented on issue #38518:
URL: https://github.com/apache/arrow/issues/38518#issuecomment-3452824541

   **Bumping this issue. The documentation says expressions should work now, 
but it doesn't..**
   
   
https://arrow.apache.org/docs/python/generated/pyarrow.dataset.Dataset.html#pyarrow.dataset.Dataset.head
   
   columns[list](https://docs.python.org/3/library/stdtypes.html#list) of 
[str](https://docs.python.org/3/library/stdtypes.html#str), default 
[None](https://docs.python.org/3/library/constants.html#None)
   
   The columns to project. This can be **a list of column names** to include 
(order and duplicates will be preserved), or **a dictionary with 
{new_column_name: expression} values** for more advanced projections.
   
   The list of columns or **expressions** may use the **special fields** 
__batch_index (the index of the batch within the fragment), __fragment_index 
(the index of the fragment within the dataset), __last_in_fragment (whether the 
batch is last in fragment), and **__filename** (the name of the source file or 
a description of the source fragment).
   
   The columns will be passed down to Datasets and corresponding data fragments 
to avoid loading, copying, and deserializing columns that will not be required 
further down the compute chain. By default all of the available columns are 
projected. Raises an exception if any of the referenced column names does not 
exist in the dataset’s Schema.
   
   **Sample code pulling data using a list of columns vs a dictionary of 
expressions.**
   
   <img width="1352" height="816" alt="Image" 
src="https://github.com/user-attachments/assets/f709c65f-121e-4d06-a665-7bd7a7573096";
 />


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to