[GitHub] [arrow] iajoiner commented on a change in pull request #9702: ARROW-11297: [C++][Python] Add ORC writer options

GitBox Mon, 29 Nov 2021 20:04:47 -0800


iajoiner commented on a change in pull request #9702:
URL: https://github.com/apache/arrow/pull/9702#discussion_r758914859




##########
File path: python/pyarrow/orc.py
##########
@@ -117,8 +117,38 @@ def read(self, columns=None):
         return self.reader.read(columns=columns)
 
 
+_orc_writer_arg_docs = """file_version : {"0.11", "0.12"}, default "0.12"
+    Determine which ORC file version to use. Hive 0.11 / ORC v0 is the older
+    version as defined `here <https://orc.apache.org/specification/ORCv0/>`
+    while Hive 0.12 / ORC v1 is the newer one as defined
+    `here <https://orc.apache.org/specification/ORCv1/>`.
+stripe_size : int, default 64 * 1024 * 1024
+    Size of each ORC stripe.
+compression : string, default 'snappy'
+    Specify the compression codec.
+    Valid values: {'NONE', 'SNAPPY', 'ZLIB', 'LZ0', 'LZ4', 'ZSTD'}
+compression_block_size : int, default 64 * 1024
+    Specify the size of each compression block.
+compression_strategy : string, default 'speed'
+    Specify the compression strategy i.e. speed vs size reduction.
+    Valid values: {'SPEED', 'COMPRESSION'}
+row_index_stride : int, default 10000
+    Specify the row index stride i.e. the number of rows per
+    an entry in the row index.
+padding_tolerance : double, default 0.0
+    Set the padding tolerance.
+dictionary_key_size_threshold : double, default 0.0
+    Specify if we should write statistics in general (default is True) or only
+    for some columns.

Review comment:
       Oops there is something weird about this. Sorry. Fixed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] iajoiner commented on a change in pull request #9702: ARROW-11297: [C++][Python] Add ORC writer options

Reply via email to