[ 
https://issues.apache.org/jira/browse/ARROW-1980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16321193#comment-16321193
 ] 

ASF GitHub Bot commented on ARROW-1980:
---------------------------------------

wesm closed pull request #1468: ARROW-1980: [Python] Fix race condition in 
write_to_dataset
URL: https://github.com/apache/arrow/pull/1468
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/python/pyarrow/parquet.py b/python/pyarrow/parquet.py
index d9f1bd2c3..151e0df8a 100644
--- a/python/pyarrow/parquet.py
+++ b/python/pyarrow/parquet.py
@@ -966,6 +966,14 @@ def write_table(table, where, row_group_size=None, 
version='1.0',
 """.format(_parquet_writer_arg_docs)
 
 
+def _mkdir_if_not_exists(fs, path):
+    if fs._isfilestore() and not fs.exists(path):
+        try:
+            fs.mkdir(path)
+        except OSError:
+            assert fs.exists(path)
+
+
 def write_to_dataset(table, root_path, partition_cols=None,
                      filesystem=None, preserve_index=True, **kwargs):
     """
@@ -1012,11 +1020,7 @@ def write_to_dataset(table, root_path, 
partition_cols=None,
     else:
         fs = _ensure_filesystem(filesystem)
 
-    if fs._isfilestore() and not fs.exists(root_path):
-        try:
-            fs.mkdir(root_path)
-        except OSError:
-            assert fs.exists(root_path)
+    _mkdir_if_not_exists(fs, root_path)
 
     if partition_cols is not None and len(partition_cols) > 0:
         df = table.to_pandas()
@@ -1034,8 +1038,7 @@ def write_to_dataset(table, root_path, 
partition_cols=None,
             subtable = Table.from_pandas(subgroup,
                                          preserve_index=preserve_index)
             prefix = "/".join([root_path, subdir])
-            if fs._isfilestore() and not fs.exists(prefix):
-                fs.mkdir(prefix)
+            _mkdir_if_not_exists(fs, prefix)
             outfile = compat.guid() + ".parquet"
             full_path = "/".join([prefix, outfile])
             with fs.open(full_path, 'wb') as f:


 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> [Python] Race condition in `write_to_dataset`
> ---------------------------------------------
>
>                 Key: ARROW-1980
>                 URL: https://issues.apache.org/jira/browse/ARROW-1980
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>            Reporter: Jim Crist
>              Labels: pull-request-available
>             Fix For: 0.9.0
>
>
> One race condition when creating directories was fixed in #1902, but a race 
> condition still exists when using `partition_cols`.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to