[ 
https://issues.apache.org/jira/browse/ARROW-14938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17452138#comment-17452138
 ] 

Weston Pace commented on ARROW-14938:
-------------------------------------

I added some info on the GH issues ticket too.  My guess is that "hive" didn't 
work because you were specifying it on the read only and not the write.

> Partition column dissappear when reading dataset
> ------------------------------------------------
>
>                 Key: ARROW-14938
>                 URL: https://issues.apache.org/jira/browse/ARROW-14938
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 6.0.1
>         Environment: Debian bullseye, python 3.9
>            Reporter: Martin Gran
>            Priority: Major
>
> Appending CSV to parquet dataset with partitioning on "code".
> {code:python}
> table = pa.Table.from_pandas(chunk)
>         pa.dataset.write_dataset(
>             table,
>             output_path,
>             basename_template=f"chunk_\{y}_\{{i}}",
>             format="parquet",
>             partitioning=["code"],
>             existing_data_behavior="overwrite_or_ignore",
>         )
> {code}
> Loading the dataset again and expecting code to be in the dataframe.
> {code:python}
> import pyarrow.dataset as ds
> dataset = ds.dataset("../data/interim/2020_elements_parquet/", 
> format="parquet",)
> df = dataset.to_table().to_pandas()
> >>>df["code"]
> {code}
> Trace
> {code:python}
> --------------------------------------------------------------------------- 
> KeyError Traceback (most recent call last) 
> ~/.local/lib/python3.9/site-packages/pandas/core/indexes/base.py in 
> get_loc(self, key, method, tolerance)  3360 try: -> 3361 return 
> self._engine.get_loc(casted_key)  3362 except KeyError as err: 
> ~/.local/lib/python3.9/site-packages/pandas/_libs/index.pyx in 
> pandas._libs.index.IndexEngine.get_loc() 
> ~/.local/lib/python3.9/site-packages/pandas/_libs/index.pyx in 
> pandas._libs.index.IndexEngine.get_loc() 
> pandas/_libs/hashtable_class_helper.pxi in 
> pandas._libs.hashtable.PyObjectHashTable.get_item() 
> pandas/_libs/hashtable_class_helper.pxi in 
> pandas._libs.hashtable.PyObjectHashTable.get_item() KeyError: 'code' The 
> above exception was the direct cause of the following exception: KeyError 
> Traceback (most recent call last) /tmp/ipykernel_24875/4149106129.py in 
> <module> ----> 1 df["code"] 
> ~/.local/lib/python3.9/site-packages/pandas/core/frame.py in 
> __getitem__(self, key)  3456 if self.columns.nlevels > 1:  3457 return 
> self._getitem_multilevel(key) -> 3458 indexer = self.columns.get_loc(key)  
> 3459 if is_integer(indexer):  3460 indexer = [indexer] 
> ~/.local/lib/python3.9/site-packages/pandas/core/indexes/base.py in 
> get_loc(self, key, method, tolerance)  3361 return 
> self._engine.get_loc(casted_key)  3362 except KeyError as err: -> 3363 raise 
> KeyError(key) from err  3364  3365 if is_scalar(key) and isna(key) and not 
> self.hasnans: KeyError: 'code'
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to