Hi,
Currently I’m working on ARROW-11297
https://github.com/mathyingzhou/arrow/tree/ARROW-11297
<https://github.com/mathyingzhou/arrow/tree/ARROW-11297>) which will be filed
as soon as the current PR is merged.
I managed to reimplement orc::WriterOptions in Arrow (with naming conventions
Arrow-ized) as arrow::adapters::orc::WriterOptions (which is necessary since we
do not allow third party headers to be included in our public headers) and
finished the C++ part of the work. Now I’m trying to expose WriterOptions in
Python. I do wonder how this is supposed to be done in general. After reading
the code in array.pxi I think maybe this is the way I want to do it:
1. The end user will see individual ORC writer options (e.g. CompressionKind,
that is, whether we use ZLIB, LZ0 or some other form of compression or none at
all) as keyword arguments.
2. These keyword arguments will be processed in _orc.pyx first as a dictionary
and then using an adapter they will be converted into an
arrow::adapters::orc::WriterOptions.
Is this the right way?
Moreover I do wonder how we should convert the enums. Shall I use a series of
if/elif or a mapping dict to force people to use one of the correct strings or
get a ValueError?
e.g.
compression_kind_mapping = {’snappy’:CompressionKind._CompressionKind_SNAPPY,
’zl0’:CompressionKind._CompressionKind_ZL0}} #There are other options, this is
just an example
If compression_kind not in compression_kind_mapping.keys():
raise ValueError(“Unknown compression_kind”)
c_compression_kind = compression_kind_mapping[compression_kind]
Ying