[jira] [Commented] (ARROW-2090) [Python] Add context manager methods to ParquetWriter

ASF GitHub Bot (JIRA) Mon, 05 Feb 2018 13:57:43 -0800

    [ 
https://issues.apache.org/jira/browse/ARROW-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16352984#comment-16352984
 ]


ASF GitHub Bot commented on ARROW-2090:
---------------------------------------

wesm closed pull request #1559: ARROW-2090: [Python] Add context methods to 
ParquetWriter
URL: https://github.com/apache/arrow/pull/1559
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/python/doc/source/parquet.rst b/python/doc/source/parquet.rst
index d466ba128..ac56520ff 100644
--- a/python/doc/source/parquet.rst
+++ b/python/doc/source/parquet.rst
@@ -139,11 +139,20 @@ We can similarly write a Parquet file with multiple row 
groups by using
    pf2 = pq.ParquetFile('example2.parquet')
    pf2.num_row_groups
 
+Alternatively python ``with`` syntax can also be use:
+
+.. ipython:: python
+
+   with pq.ParquetWriter('example3.parquet', table.schema) as writer:
+       for i in range(3):
+           writer.write_table(table)
+
 .. ipython:: python
    :suppress:
 
    !rm example.parquet
    !rm example2.parquet
+   !rm example3.parquet
 
 Compression, Encoding, and File Compatibility
 ---------------------------------------------
diff --git a/python/pyarrow/parquet.py b/python/pyarrow/parquet.py
index 3a0924a27..8820b6b4a 100644
--- a/python/pyarrow/parquet.py
+++ b/python/pyarrow/parquet.py
@@ -292,6 +292,14 @@ def __del__(self):
         if getattr(self, 'is_open', False):
             self.close()
 
+    def __enter__(self):
+        return self
+
+    def __exit__(self, *args, **kwargs):
+        self.close()
+        # return false since we want to propagate exceptions
+        return False
+
     def write_table(self, table, row_group_size=None):
         if self.schema_changed:
             table = _sanitize_table(table, self.schema, self.flavor)
@@ -932,29 +940,24 @@ def write_table(table, where, row_group_size=None, 
version='1.0',
                 flavor=None, **kwargs):
     row_group_size = kwargs.pop('chunk_size', row_group_size)
 
-    writer = None
     try:
-        writer = ParquetWriter(
-            where, table.schema,
-            version=version,
-            flavor=flavor,
-            use_dictionary=use_dictionary,
-            coerce_timestamps=coerce_timestamps,
-            compression=compression,
-            use_deprecated_int96_timestamps=use_deprecated_int96_timestamps,
-            **kwargs)
-        writer.write_table(table, row_group_size=row_group_size)
+        with ParquetWriter(
+                where, table.schema,
+                version=version,
+                flavor=flavor,
+                use_dictionary=use_dictionary,
+                coerce_timestamps=coerce_timestamps,
+                compression=compression,
+                use_deprecated_int96_timestamps= 
use_deprecated_int96_timestamps, # noqa
+                **kwargs) as writer:
+            writer.write_table(table, row_group_size=row_group_size)
     except Exception:
-        if writer is not None:
-            writer.close()
         if isinstance(where, six.string_types):
             try:
                 os.remove(where)
             except os.error:
                 pass
         raise
-    else:
-        writer.close()
 
 
 write_table.__doc__ = """
diff --git a/python/pyarrow/tests/test_parquet.py 
b/python/pyarrow/tests/test_parquet.py
index 7c2edb378..c49f3d396 100644
--- a/python/pyarrow/tests/test_parquet.py
+++ b/python/pyarrow/tests/test_parquet.py
@@ -1673,3 +1673,66 @@ def test_decimal_roundtrip_negative_scale(tmpdir):
     result_table = _read_table(string_filename)
     result = result_table.to_pandas()
     tm.assert_frame_equal(result, expected)
+
+
+@parquet
+def test_parquet_writer_context_obj(tmpdir):
+
+    import pyarrow.parquet as pq
+
+    df = _test_dataframe(100)
+    df['unique_id'] = 0
+
+    arrow_table = pa.Table.from_pandas(df, preserve_index=False)
+    out = pa.BufferOutputStream()
+
+    with pq.ParquetWriter(out, arrow_table.schema, version='2.0') as writer:
+
+        frames = []
+        for i in range(10):
+            df['unique_id'] = i
+            arrow_table = pa.Table.from_pandas(df, preserve_index=False)
+            writer.write_table(arrow_table)
+
+            frames.append(df.copy())
+
+    buf = out.get_result()
+    result = _read_table(pa.BufferReader(buf))
+
+    expected = pd.concat(frames, ignore_index=True)
+    tm.assert_frame_equal(result.to_pandas(), expected)
+
+
+@parquet
+def test_parquet_writer_context_obj_with_exception(tmpdir):
+
+    import pyarrow.parquet as pq
+
+    df = _test_dataframe(100)
+    df['unique_id'] = 0
+
+    arrow_table = pa.Table.from_pandas(df, preserve_index=False)
+    out = pa.BufferOutputStream()
+    error_text = 'Artificial Error'
+
+    try:
+        with pq.ParquetWriter(out,
+                              arrow_table.schema,
+                              version='2.0') as writer:
+
+            frames = []
+            for i in range(10):
+                df['unique_id'] = i
+                arrow_table = pa.Table.from_pandas(df, preserve_index=False)
+                writer.write_table(arrow_table)
+                frames.append(df.copy())
+                if i == 5:
+                    raise ValueError(error_text)
+    except Exception as e:
+        assert str(e) == error_text
+
+    buf = out.get_result()
+    result = _read_table(pa.BufferReader(buf))
+
+    expected = pd.concat(frames, ignore_index=True)
+    tm.assert_frame_equal(result.to_pandas(), expected)


 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> [Python] Add context manager methods to ParquetWriter
> -----------------------------------------------------
>
>                 Key: ARROW-2090
>                 URL: https://issues.apache.org/jira/browse/ARROW-2090
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Python
>            Reporter: Alec Posney
>            Priority: Minor
>              Labels: pull-request-available
>             Fix For: 0.9.0
>
>
> Add the ability to use python {{with}} syntax on the {{ParquetWriter}} object.
> For example:
> {code:python}
> with pq.ParquetWriter(foo, schema) as writer:
>     writer.write_table(table)
> {code}
> The benefit of this syntax is that it removes the chances of you writing out 
> a partial (invalid) parquet file, which is currently possible if you forget 
> to call the close method, or more likely the close method is not called due 
> to an exception being thrown. 
> It should still be possible to use the previous syntax for backwards 
> compatibility and fine grained control reasons.
> Similarly, the {{parquet}} module level {{write_table}} method should be able 
> to use the new syntax without changing previous behaviour. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-2090) [Python] Add context manager methods to ParquetWriter

Reply via email to