[ 
https://issues.apache.org/jira/browse/ARROW-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-3246.
---------------------------------
       Resolution: Fixed
    Fix Version/s:     (was: 1.0.0)
                   0.15.0

Issue resolved by pull request 5077
[https://github.com/apache/arrow/pull/5077]

> [Python][Parquet] direct reading/writing of pandas categoricals in parquet
> --------------------------------------------------------------------------
>
>                 Key: ARROW-3246
>                 URL: https://issues.apache.org/jira/browse/ARROW-3246
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Python
>            Reporter: Martin Durant
>            Assignee: Wes McKinney
>            Priority: Minor
>              Labels: parquet, pull-request-available
>             Fix For: 0.15.0
>
>          Time Spent: 9h 40m
>  Remaining Estimate: 0h
>
> Parquet supports "dictionary encoding" of column data in a manner very 
> similar to the concept of Categoricals in pandas. It is natural to use this 
> encoding for a column which originated as a categorical. Conversely, when 
> loading, if the file metadata says that a given column came from a pandas (or 
> arrow) categorical, then we can trust that the whole of the column is 
> dictionary-encoded and load the data directly into a categorical column, 
> rather than expanding the labels upon load and recategorising later.
> If the data does not have the pandas metadata, then the guarantee cannot 
> hold, and we cannot assume either that the whole column is dictionary encoded 
> or that the labels are the same throughout. In this case, the current 
> behaviour is fine.
>  
> (please forgive that some of this has already been mentioned elsewhere; this 
> is one of the entries in the list at 
> [https://github.com/dask/fastparquet/issues/374] as a feature that is useful 
> in fastparquet)



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to