[jira] [Commented] (ARROW-13781) [Python] Allow per column encoding in parquet writer

Joris Van den Bossche (Jira) Tue, 16 Nov 2021 04:33:04 -0800


    [ 
https://issues.apache.org/jira/browse/ARROW-13781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17444510#comment-17444510
 ]


Joris Van den Bossche commented on ARROW-13781:
-----------------------------------------------

It's generally not good that C++ exceptions can get to Python code, but the 
reason it is raising an exception here is that the {{col_encoding}} 
configuration for Parquet is to specify which encoding to use _when not using 
dictionary encoding_. So that's the reason it is raising an exception if you 
pass a dictionary encoding option to {{col_encoding}}.

Short term, I think the easiest solution for your PR is to explicitly check for 
this as well in the Python bindings, so avoiding that "RLE_DICTIONARY" gets 
passed to the C++ code setting this option, and this way avoiding to trigger 
the C++ exception (and instead raise a python exception).

> [Python] Allow per column encoding in parquet writer 
> -----------------------------------------------------
>
>                 Key: ARROW-13781
>                 URL: https://issues.apache.org/jira/browse/ARROW-13781
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Parquet, Python
>            Reporter: Brian Kiefer
>            Assignee: Alenka Frim
>            Priority: Minor
>
> Add a new parameter to `write_table` to allow parquet encodings to be defined 
> on a per column basis. This should supercede use_dictionary and 
> use_byte_stream_split.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (ARROW-13781) [Python] Allow per column encoding in parquet writer

Reply via email to