stefan-lange-dataeng opened a new issue #8607: URL: https://github.com/apache/arrow/issues/8607
https://github.com/apache/arrow/blob/47f2e0cb03ed8ad265e0688ada8162bf46066483/python/pyarrow/parquet.py#L1737 When write_table encounters a problem, the exception handler removes the attempted output parquet file (see snippet below). This logic makes sense in order to make sure no file with inconsistent content/state remains. However, if a file with the same name already exists, it gets also deleted. Would it make sense to add an option to let the user choose the behaviour in such a case, e. g. to choose to keep an existing file and to only overwrite it if the action is successful? And/or: Would it make sense to check early if the intended file can be written and fail early if that is not the case (without deleting a preexisting file)? E. g. if the directory has permission 755 and the already existing file has permission 444, then the write attempt fails with a PermissionError but the exception handler deletes the preexisting file. This behaviour is a bit counterintuitive? Or would you say the responsibility lies with the people setting the file/directory permissions right? except Exception: if _is_path_like(where): try: os.remove(_stringify_path(where)) except os.error: pass ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
