[GitHub] [arrow] AlenkaF commented on a diff in pull request #12704: ARROW-15428: [Python] Address docstrings in Parquet classes and functions

GitBox Mon, 04 Apr 2022 02:30:33 -0700


AlenkaF commented on code in PR #12704:
URL: https://github.com/apache/arrow/pull/12704#discussion_r841541096



##########
python/pyarrow/parquet.py:
##########
@@ -2201,6 +3055,47 @@ def write_to_dataset(table, root_path, 
partition_cols=None,
         Using `metadata_collector` in kwargs allows one to collect the
         file metadata instances of dataset pieces. The file paths in the
         ColumnChunkMetaData will be set relative to `root_path`.
+
+    Examples
+    --------
+    Generate an example PyArrow Table:
+
+    >>> import pyarrow as pa
+    >>> import pandas as pd
+    >>> df = pd.DataFrame({'year': [2020, 2022, 2021, 2022, 2019, 2021],
+    ...                    'month': [3, 5, 7, 9, 11, 12],
+    ...                    'day': [1, 5, 9, 13, 17, 23],
+    ...                    'n_legs': [2, 2, 4, 4, 5, 100],
+    ...                    'animals': ["Flamingo", "Parrot", "Dog", "Horse",
+    ...                    "Brittle stars", "Centipede"]})
+    >>> table = pa.Table.from_pandas(df)
+
+    and write it to a partitioned dataset:
+
+    >>> import pyarrow.parquet as pq
+    >>> pq.write_to_dataset(table, root_path='dataset_name_3',
+    ...                     partition_cols=['year', 'month', 'day'],
+    ...                     use_legacy_dataset=False
+    ...                    )
+    >>> pq.ParquetDataset('dataset_name_3', use_legacy_dataset=False).files
+    ['dataset_name_3/year=2019/month=11/day=17/part-0.parquet', ...
+
+    Use old Arrow Dataset API and override the partition filename:
+
+    >>> pq.write_to_dataset(table, root_path='dataset_name_5',
+    ...                     partition_cols=['year', 'month', 'day'],
+    ...                     partition_filename_cb=lambda x:
+    ...                     str(x[0]) + str(x[1]) + str(x[2])  + '.parquet'
+    ...                    )
+    >>> pq.ParquetDataset('dataset_name_5/', use_legacy_dataset=False).files
+    ['dataset_name_5/year=2019/month=11/day=17/20191117.parquet', ...
+
+    Write to a single Parquet file:

Review Comment:
   Yes, I should make this clearer. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] AlenkaF commented on a diff in pull request #12704: ARROW-15428: [Python] Address docstrings in Parquet classes and functions

Reply via email to