[ 
https://issues.apache.org/jira/browse/ARROW-4311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francisco Sanchez updated ARROW-4311:
-------------------------------------
    Description: 
In the latest changes to filesystem.py some new functions have been added to 
check the source string when calling pq.ParquetWriter. With the current 
implementation some assumptions are done about the format of the string which 
means that if the string is provided following some of these patterns it will 
be automatically split/formatted and changed to something else.

To give you a specific example, if I provide a string like 
{{directory/level1#level2.parquet}} it will be written to disk as 
{{directory/level1}}. The behaviour has changed on 0.12.0 from 0.11.1 and 
nothing is stated in the documentation.

> [Python] Regression on pq.ParquetWriter incorrectly handling source string
> --------------------------------------------------------------------------
>
>                 Key: ARROW-4311
>                 URL: https://issues.apache.org/jira/browse/ARROW-4311
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.12.0
>            Reporter: Francisco Sanchez
>            Priority: Major
>
> In the latest changes to filesystem.py some new functions have been added to 
> check the source string when calling pq.ParquetWriter. With the current 
> implementation some assumptions are done about the format of the string which 
> means that if the string is provided following some of these patterns it will 
> be automatically split/formatted and changed to something else.
> To give you a specific example, if I provide a string like 
> {{directory/level1#level2.parquet}} it will be written to disk as 
> {{directory/level1}}. The behaviour has changed on 0.12.0 from 0.11.1 and 
> nothing is stated in the documentation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to