Repository: arrow Updated Branches: refs/heads/master 2a059bd27 -> 17c9ae7c4
ARROW-357: Use a single RowGroup for Parquet files as default. This is not the optimal choice, we should rather have an option to optimise for the underlying block size of the filesystem but without the infrastructure for that in ``parquet-cpp``, writing a single RowGroup is the much better choice. Author: Uwe L. Korn <[email protected]> Closes #192 from xhochy/ARROW-357 and squashes the following commits: 9eccefd [Uwe L. Korn] ARROW-357: Use a single RowGroup for Parquet files as default. Project: http://git-wip-us.apache.org/repos/asf/arrow/repo Commit: http://git-wip-us.apache.org/repos/asf/arrow/commit/17c9ae7c Tree: http://git-wip-us.apache.org/repos/asf/arrow/tree/17c9ae7c Diff: http://git-wip-us.apache.org/repos/asf/arrow/diff/17c9ae7c Branch: refs/heads/master Commit: 17c9ae7c4ceb328c897fb6c9025c763a879ebefa Parents: 2a059bd Author: Uwe L. Korn <[email protected]> Authored: Wed Nov 2 12:20:15 2016 -0400 Committer: Wes McKinney <[email protected]> Committed: Wed Nov 2 12:20:15 2016 -0400 ---------------------------------------------------------------------- python/pyarrow/parquet.pyx | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/arrow/blob/17c9ae7c/python/pyarrow/parquet.pyx ---------------------------------------------------------------------- diff --git a/python/pyarrow/parquet.pyx b/python/pyarrow/parquet.pyx index 019dd2c..a56c1e1 100644 --- a/python/pyarrow/parquet.pyx +++ b/python/pyarrow/parquet.pyx @@ -106,7 +106,8 @@ def write_table(table, filename, chunk_size=None, version=None, table : pyarrow.Table filename : string chunk_size : int - The maximum number of rows in each Parquet RowGroup + The maximum number of rows in each Parquet RowGroup. As a default, + we will write a single RowGroup per file. version : {"1.0", "2.0"}, default "1.0" The Parquet format version, defaults to 1.0 use_dictionary : bool or list @@ -121,7 +122,7 @@ def write_table(table, filename, chunk_size=None, version=None, cdef WriterProperties.Builder properties_builder cdef int64_t chunk_size_ = 0 if chunk_size is None: - chunk_size_ = min(ctable_.num_rows(), int(2**16)) + chunk_size_ = ctable_.num_rows() else: chunk_size_ = chunk_size
