[
https://issues.apache.org/jira/browse/ARROW-13781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17444510#comment-17444510
]
Joris Van den Bossche commented on ARROW-13781:
-----------------------------------------------
It's generally not good that C++ exceptions can get to Python code, but the
reason it is raising an exception here is that the {{col_encoding}}
configuration for Parquet is to specify which encoding to use _when not using
dictionary encoding_. So that's the reason it is raising an exception if you
pass a dictionary encoding option to {{col_encoding}}.
Short term, I think the easiest solution for your PR is to explicitly check for
this as well in the Python bindings, so avoiding that "RLE_DICTIONARY" gets
passed to the C++ code setting this option, and this way avoiding to trigger
the C++ exception (and instead raise a python exception).
> [Python] Allow per column encoding in parquet writer
> -----------------------------------------------------
>
> Key: ARROW-13781
> URL: https://issues.apache.org/jira/browse/ARROW-13781
> Project: Apache Arrow
> Issue Type: Improvement
> Components: Parquet, Python
> Reporter: Brian Kiefer
> Assignee: Alenka Frim
> Priority: Minor
>
> Add a new parameter to `write_table` to allow parquet encodings to be defined
> on a per column basis. This should supercede use_dictionary and
> use_byte_stream_split.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)