[
https://issues.apache.org/jira/browse/ARROW-18269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated ARROW-18269:
-----------------------------------
Labels: good-first-issue pull-request-available (was: good-first-issue)
> [C++] Slash character in partition value handling
> -------------------------------------------------
>
> Key: ARROW-18269
> URL: https://issues.apache.org/jira/browse/ARROW-18269
> Project: Apache Arrow
> Issue Type: Bug
> Components: C++, Python
> Affects Versions: 10.0.0
> Reporter: Vadym Dytyniak
> Assignee: Vibhatha Lakmal Abeykoon
> Priority: Major
> Labels: good-first-issue, pull-request-available
> Time Spent: 10m
> Remaining Estimate: 0h
>
>
> Provided example shows that pyarrow does not handle partition value that
> contains '/' correctly:
> {code:java}
> import pandas as pd
> import pyarrow as pa
> from pyarrow import dataset as ds
> df = pd.DataFrame({
> 'value': [1, 2],
> 'instrument_id': ['A/Z', 'B'],
> })
> ds.write_dataset(
> data=pa.Table.from_pandas(df),
> base_dir='data',
> format='parquet',
> partitioning=['instrument_id'],
> partitioning_flavor='hive',
> )
> table = ds.dataset(
> source='data',
> format='parquet',
> partitioning='hive',
> ).to_table()
> tables = [table]
> df = pa.concat_tables(tables).to_pandas() tables = [table]
> df = pa.concat_tables(tables).to_pandas()
> print(df.head()){code}
> Result:
> {code:java}
> value instrument_id
> 0 1 A
> 1 2 B {code}
> Expected behaviour:
> Option 1: Result should be:
> {code:java}
> value instrument_id
> 0 1 A/Z
> 1 2 B {code}
> Option 2: Error should be raised to avoid '/' in partition value.
>
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)