HyukjinKwon commented on code in PR #48619:
URL: https://github.com/apache/arrow/pull/48619#discussion_r2685467868
##########
docs/source/python/getstarted.rst:
##########
@@ -118,33 +131,41 @@ Arrow also provides the :class:`pyarrow.dataset` API to
work with
large data, which will handle for you partitioning of your data in
smaller chunks
-.. ipython:: python
-
- import pyarrow.dataset as ds
+.. code-block:: python
- ds.write_dataset(birthdays_table, "savedir", format="parquet",
- partitioning=ds.partitioning(
- pa.schema([birthdays_table.schema.field("years")])
- ))
+ >>> import pyarrow.dataset as ds
+ >>> ds.write_dataset(birthdays_table, "savedir", format="parquet",
+ ... partitioning=ds.partitioning(
+ ... pa.schema([birthdays_table.schema.field("years")])
+ ... ))
Loading back the partitioned dataset will detect the chunks
-.. ipython:: python
-
- birthdays_dataset = ds.dataset("savedir", format="parquet",
partitioning=["years"])
+.. code-block:: python
- birthdays_dataset.files
+ >>> birthdays_dataset = ds.dataset("savedir", format="parquet",
partitioning=["years"])
+ >>> birthdays_dataset.files
+ ['savedir/1990/part-0.parquet', 'savedir/1995/part-0.parquet',
'savedir/2000/part-0.parquet']
and will lazily load chunks of data only when iterating over them
-.. ipython:: python
- :okexcept:
-
- import datetime
-
- current_year = datetime.datetime.now(datetime.UTC).year
- for table_chunk in birthdays_dataset.to_batches():
- print("AGES", pc.subtract(current_year, table_chunk["years"]))
+.. code-block:: python
+
+ >>> import datetime
Review Comment:
No biggie but seems like this is not used.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]