stefan-lange-dataeng opened a new issue #8607:
URL: https://github.com/apache/arrow/issues/8607


   
https://github.com/apache/arrow/blob/47f2e0cb03ed8ad265e0688ada8162bf46066483/python/pyarrow/parquet.py#L1737
   
   When write_table encounters a problem, the exception handler removes the 
attempted output parquet file (see snippet below).
   This logic makes sense in order to make sure no file with inconsistent 
content/state remains.
   However, if a file with the same name already exists, it gets also deleted.
   
   Would it make sense to add an option to let the user choose the behaviour in 
such a case, e. g. to choose to keep an existing file and to only overwrite it 
if the action is successful?
   And/or: Would it make sense to check early if the intended file can be 
written and fail early if that is not the case (without deleting a preexisting 
file)?
   E. g. if the directory has permission 755 and the already existing file has 
permission 444, then the write attempt fails with a PermissionError but the 
exception handler deletes the preexisting file. This behaviour is a bit 
counterintuitive?
   Or would you say the responsibility lies with the people setting the 
file/directory permissions right?
   
   except Exception:
           if _is_path_like(where):
               try:
                   os.remove(_stringify_path(where))
               except os.error:
                   pass


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to