[GitHub] [arrow] martgra opened a new issue #11826: Partition in dataset

GitBox Wed, 01 Dec 2021 04:38:16 -0800


martgra opened a new issue #11826:
URL: https://github.com/apache/arrow/issues/11826



   Hi, 
   
   I'm wondering how partitions work in the new Datasets api.
   
   This is part of my code where data is written:
   ```python
   pa.dataset.write_dataset(
               table,
               output_path,
               basename_template=f"chunk_{y}_{{i}}",
               format="parquet",
               partitioning=["code"],
               existing_data_behavior="overwrite_or_ignore",
           )
   ```
   However when reading out data again results in:
   ```python
   >>>pd.read_parquet(path, filters=[('code', '=', "1234")])
   
   Trace:
   ArrowInvalid: No match for FieldRef.Name(code)
   ```
   Is this expected? That partition columns dissappear from table? I have also 
tried directly with pyarrow, and also looking at the table columns "code" is 
missing. 
   
   FYI: 
   
   Thank you!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] martgra opened a new issue #11826: Partition in dataset

Reply via email to