[
https://issues.apache.org/jira/browse/ARROW-6277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16923401#comment-16923401
]
Uwe L. Korn commented on ARROW-6277:
------------------------------------
This could be interesting for date columns when working together with pandas.
To correctly round-trip date columns in the cycle Parquet -> Arrow -> pandas ->
Arrow -> Parquet you need to use object columns in pandas with datetime.date
objects. These can be quite repetitive and thus using dictionary encoding helps
a lot here. Otherwise I would see the same use case for float columns but that
isn't something I haven't yet used, mostly due to pandas not really working
well with float categories.
> [C++][Parquet] Support reading/writing other Parquet primitive types to
> DictionaryArray
> ---------------------------------------------------------------------------------------
>
> Key: ARROW-6277
> URL: https://issues.apache.org/jira/browse/ARROW-6277
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++
> Reporter: Wes McKinney
> Priority: Major
> Fix For: 0.15.0
>
>
> As follow up to ARROW-3246, we should support direct read/write of the other
> Parquet primitive types. Currently only BYTE_ARRAY is implemented as it
> provides the most performance benefit.
--
This message was sent by Atlassian Jira
(v8.3.2#803003)