Repository: arrow
Updated Branches:
  refs/heads/master 2a059bd27 -> 17c9ae7c4


ARROW-357: Use a single RowGroup for Parquet files as default.

This is not the optimal choice, we should rather have an option to optimise for 
the underlying block size of the filesystem but without the infrastructure for 
that in ``parquet-cpp``, writing a single RowGroup is the much better choice.

Author: Uwe L. Korn <[email protected]>

Closes #192 from xhochy/ARROW-357 and squashes the following commits:

9eccefd [Uwe L. Korn] ARROW-357: Use a single RowGroup for Parquet files as 
default.


Project: http://git-wip-us.apache.org/repos/asf/arrow/repo
Commit: http://git-wip-us.apache.org/repos/asf/arrow/commit/17c9ae7c
Tree: http://git-wip-us.apache.org/repos/asf/arrow/tree/17c9ae7c
Diff: http://git-wip-us.apache.org/repos/asf/arrow/diff/17c9ae7c

Branch: refs/heads/master
Commit: 17c9ae7c4ceb328c897fb6c9025c763a879ebefa
Parents: 2a059bd
Author: Uwe L. Korn <[email protected]>
Authored: Wed Nov 2 12:20:15 2016 -0400
Committer: Wes McKinney <[email protected]>
Committed: Wed Nov 2 12:20:15 2016 -0400

----------------------------------------------------------------------
 python/pyarrow/parquet.pyx | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/arrow/blob/17c9ae7c/python/pyarrow/parquet.pyx
----------------------------------------------------------------------
diff --git a/python/pyarrow/parquet.pyx b/python/pyarrow/parquet.pyx
index 019dd2c..a56c1e1 100644
--- a/python/pyarrow/parquet.pyx
+++ b/python/pyarrow/parquet.pyx
@@ -106,7 +106,8 @@ def write_table(table, filename, chunk_size=None, 
version=None,
     table : pyarrow.Table
     filename : string
     chunk_size : int
-        The maximum number of rows in each Parquet RowGroup
+        The maximum number of rows in each Parquet RowGroup. As a default,
+        we will write a single RowGroup per file.
     version : {"1.0", "2.0"}, default "1.0"
         The Parquet format version, defaults to 1.0
     use_dictionary : bool or list
@@ -121,7 +122,7 @@ def write_table(table, filename, chunk_size=None, 
version=None,
     cdef WriterProperties.Builder properties_builder
     cdef int64_t chunk_size_ = 0
     if chunk_size is None:
-        chunk_size_ = min(ctable_.num_rows(), int(2**16))
+        chunk_size_ = ctable_.num_rows()
     else:
         chunk_size_ = chunk_size
 

Reply via email to