ap-- commented on PR #41921:
URL: https://github.com/apache/airflow/pull/41921#issuecomment-2323438156

   Regarding the `overwrite` argument in `rename`: The implementation of 
`UPath.rename` relies on `AbstractFileSystem.mv`. Below is a report for a 
selection of available fsspec filesystems, showing if they rely on the default 
implementation or if they override the method.
   
   **tldr:** the default and most custom implementations throw away extra 
keyword arguments. The only one in this list that doesn't is `SMBFileSystem`, 
but it seems there's no overwrite keyword argument further down in 
`smbclient.rename` either. So unless you use a fsspec filesystem that is not 
listed here and requires the `overwrite` kwarg, it should be safe to remove.
   
   See below for details on the different rename implementations:
   
   
   # fsspec subclasses: customized method report for 'mv'
   <details>
   
   <summary>Click to see the environment used to generate the report</summary>
   
   ```
   adlfs==2024.7.0
   aenum==3.1.15
   aiobotocore==2.14.0
   aiofile==3.8.8
   aiohappyeyeballs==2.4.0
   aiohttp==3.10.5
   aiohttp-retry==2.8.3
   aioitertools==0.11.0
   aiooss2==0.2.10
   aiosignal==1.3.1
   aliyun-python-sdk-core==2.15.2
   aliyun-python-sdk-kms==2.16.5
   amqp==5.2.0
   annotated-types==0.7.0
   antlr4-python3-runtime==4.9.3
   anyio==4.4.0
   appdirs==1.4.4
   asyncssh==2.16.0
   atpublic==5.0
   attrs==24.2.0
   azure-core==1.30.2
   azure-datalake-store==0.0.53
   azure-identity==1.17.1
   azure-storage-blob==12.22.0
   bcrypt==4.2.0
   billiard==4.2.0
   botocore==1.35.7
   boxfs==0.3.0
   boxsdk==3.13.0
   cachetools==5.5.0
   caio==0.9.17
   celery==5.4.0
   certifi==2024.8.30
   cffi==1.17.0
   charset-normalizer==3.3.2
   circuitbreaker==2.0.0
   click==8.1.7
   click-didyoumean==0.3.1
   click-plugins==1.1.1
   click-repl==0.3.0
   cloudpickle==3.0.0
   colorama==0.4.6
   configobj==5.0.8
   crcmod==1.7
   cryptography==42.0.8
   dask==2024.8.2
   decorator==5.1.1
   dictdiffer==0.9.0
   diskcache==5.6.3
   distributed==2024.8.2
   distro==1.9.0
   docker-pycreds==0.4.0
   dpath==2.2.0
   dropbox==12.0.2
   dropboxdrivefs==1.4.1
   dulwich==0.22.1
   dvc==3.53.1
   dvc-data==3.15.2
   dvc-http==2.32.0
   dvc-objects==5.1.0
   dvc-render==1.0.2
   dvc-studio-client==0.21.0
   dvc-task==0.4.0
   entrypoints==0.4
   filelock==3.15.4
   flatten-dict==0.4.2
   flufl.lock==8.1.0
   frozenlist==1.4.1
   fsspec==2024.6.1
   fsspec_xrootd==0.3.0
   funcy==2.0
   gcsfs==2024.6.1
   gitdb==4.0.11
   GitPython==3.1.43
   google-api-core==2.19.2
   google-auth==2.34.0
   google-auth-oauthlib==1.2.1
   google-cloud-core==2.4.1
   google-cloud-storage==2.18.2
   google-crc32c==1.5.0
   google-resumable-media==2.7.2
   googleapis-common-protos==1.65.0
   grandalf==0.8
   gto==1.7.1
   h11==0.14.0
   httpcore==1.0.5
   httpx==0.27.2
   huggingface-hub==0.23.5
   hydra-core==1.3.2
   idna==3.8
   isodate==0.6.1
   iterative-telemetry==0.0.8
   Jinja2==3.1.4
   jmespath==0.10.0
   kombu==5.4.0
   lakefs==0.7.1
   lakefs-sdk==1.32.1
   lakefs-spec==0.10.0
   locket==1.0.0
   markdown-it-py==3.0.0
   MarkupSafe==2.1.5
   mdurl==0.1.2
   morefs==0.2.2
   msal==1.30.0
   msal-extensions==1.2.0
   msgpack==1.0.8
   multidict==6.0.5
   networkx==3.3
   numpy==2.1.0
   oauthlib==3.2.2
   oci==2.133.0
   ocifs==1.3.1
   omegaconf==2.3.0
   orjson==3.10.7
   oss2==2.18.1
   ossfs==2023.12.0
   packaging==24.1
   paramiko==3.4.1
   partd==1.4.2
   pathspec==0.12.1
   platformdirs==3.11.0
   ply==3.11
   portalocker==2.10.1
   prompt_toolkit==3.0.47
   proto-plus==1.24.0
   protobuf==5.28.0
   psutil==6.0.0
   pyarrow==17.0.0
   pyasn1==0.6.0
   pyasn1_modules==0.4.0
   pycparser==2.22
   pycryptodome==3.20.0
   pydantic==2.8.2
   pydantic_core==2.20.1
   pydot==3.0.1
   pygit2==1.15.1
   Pygments==2.18.0
   pygtrie==2.5.0
   PyJWT==2.9.0
   PyNaCl==1.5.0
   pyOpenSSL==24.2.1
   pyparsing==3.1.4
   pyspnego==0.11.1
   python-dateutil==2.9.0.post0
   pytz==2024.1
   PyYAML==6.0.2
   requests==2.32.3
   requests-oauthlib==2.0.0
   requests-toolbelt==1.0.0
   rich==13.8.0
   rsa==4.9
   ruamel.yaml==0.18.6
   ruamel.yaml.clib==0.2.8
   s3fs==2024.6.1
   scmrepo==3.3.7
   semver==3.0.2
   sentry-sdk==2.13.0
   setproctitle==1.3.3
   setuptools==74.0.0
   shellingham==1.5.4
   shortuuid==1.0.13
   shtab==1.7.1
   six==1.16.0
   smbprotocol==1.14.0
   smmap==5.0.1
   sniffio==1.3.1
   sortedcontainers==2.4.0
   sqltrie==0.11.1
   stone==3.3.1
   tabulate==0.9.0
   tblib==3.0.0
   tomlkit==0.13.2
   toolz==0.12.1
   tornado==6.4.1
   tqdm==4.66.5
   typer==0.12.5
   typing_extensions==4.12.2
   tzdata==2024.1
   urllib3==2.0.7
   vine==5.1.0
   voluptuous==0.15.2
   wandb==0.17.8
   wandbfs==0.0.2
   wcwidth==0.2.13
   webdav4==0.10.0
   wrapt==1.16.0
   yarl==1.9.6
   zc.lockfile==3.0.post1
   zict==3.0.0
   
   ```
   
   </details>
   
   ## Default implementation
   ```python
       def mv(self, path1, path2, recursive=False, maxdepth=None, **kwargs):
           """Move file(s) from one location to another"""
           if path1 == path2:
               logger.debug("%s mv: The paths are the same, so no files were 
moved.", self)
           else:
               # explicitly raise exception to prevent data corruption
               self.copy(
                   path1, path2, recursive=recursive, maxdepth=maxdepth, 
onerror="raise"
               )
               self.rm(path1, recursive=recursive)
   
   ```
   
   These filesystem classes do not customize the method 'mv':
   -  `AzureBlobFileSystem`
   -  `AzureDatalakeFileSystem`
   -  `CachingFileSystem`
   -  `BoxFileSystem`
   -  `DataFileSystem`
   -  `DropboxDriveFileSystem`
   -  `_DVCFileSystem`
   -  `WholeFileCacheFileSystem`
   -  `GCSFileSystem`
   -  `GenericFileSystem`
   -  `GitFileSystem`
   -  `GithubFileSystem`
   -  `HfFileSystem`
   -  `HTTPFileSystem`
   -  `JupyterFileSystem`
   -  `LakeFSFileSystem`
   -  `LibArchiveFileSystem`
   -  `MemoryFileSystem`
   -  `OCIFileSystem`
   -  `OSSFileSystem`
   -  `ReferenceFileSystem`
   -  `XRootDFileSystem`
   -  `S3FileSystem`
   -  `SimpleCacheFileSystem`
   -  `TarFileSystem`
   -  `WandbFS`
   -  `ZipFileSystem`
   -  `DictFS`
   -  `MemFS`
   -  `OverlayFileSystem`
   
   ## Subclasses customizing 'mv'
   ### HadoopFileSystem
   `HadoopFileSystem.mv` is customized and signature is different
   - base_cls: `(self, path1, path2, recursive=False, maxdepth=None, **kwargs)`
   - subclass: `(self, path1, path2, **kwargs)`
   ```python
       @wrap_exceptions
       def mv(self, path1, path2, **kwargs):
           path1 = self._strip_protocol(path1).rstrip("/")
           path2 = self._strip_protocol(path2).rstrip("/")
           self.fs.move(path1, path2)
   
   ```
   
   ### AsyncLocalFileSystem
   `AsyncLocalFileSystem.mv` is customized and signature is different
   - base_cls: `(self, path1, path2, recursive=False, maxdepth=None, **kwargs)`
   - subclass: `(self, path1, path2, **kwargs)`
   ```python
       def mv(self, path1, path2, **kwargs):
           path1 = self._strip_protocol(path1)
           path2 = self._strip_protocol(path2)
           shutil.move(path1, path2)
   
   ```
   
   ### DaskWorkerFileSystem
   `DaskWorkerFileSystem.mv` is customized and signature is different
   - base_cls: `(self, path1, path2, recursive=False, maxdepth=None, **kwargs)`
   - subclass: `(self, *args, **kwargs)`
   ```python
       def mv(self, *args, **kwargs):
           if self.worker:
               self.fs.mv(*args, **kwargs)
           else:
               self.rfs.mv(*args, **kwargs).compute()
   
   ```
   
   ### DatabricksFileSystem
   `DatabricksFileSystem.mv` is customized and signature is different
   - base_cls: `(self, path1, path2, recursive=False, maxdepth=None, **kwargs)`
   - subclass: `(self, source_path, destination_path, recursive=False, 
maxdepth=None, **kwargs)`
   ```python
       def mv(
           self, source_path, destination_path, recursive=False, maxdepth=None, 
**kwargs
       ):
           """
           Move a source to a destination path.
   
           A note from the original [databricks API manual]
           (https://docs.databricks.com/dev-tools/api/latest/dbfs.html#move).
   
           When moving a large number of files the API call will time out after
           approximately 60s, potentially resulting in partially moved data.
           Therefore, for operations that move more than 10k files, we strongly
           discourage using the DBFS REST API.
   
           Parameters
           ----------
           source_path: str
               From where to move (absolute path)
           destination_path: str
               To where to move (absolute path)
           recursive: bool
               Not implemented to far.
           maxdepth:
               Not implemented to far.
           """
           if recursive:
               raise NotImplementedError
           if maxdepth:
               raise NotImplementedError
   
           try:
               self._send_to_api(
                   method="post",
                   endpoint="move",
                   json={"source_path": source_path, "destination_path": 
destination_path},
               )
           except DatabricksException as e:
               if e.error_code == "RESOURCE_DOES_NOT_EXIST":
                   raise FileNotFoundError(e.message)
               elif e.error_code == "RESOURCE_ALREADY_EXISTS":
                   raise FileExistsError(e.message)
   
               raise e
           self.invalidate_cache(self._parent(source_path))
           self.invalidate_cache(self._parent(destination_path))
   
   ```
   
   ### DirFileSystem
   `DirFileSystem.mv` is customized and signature is different
   - base_cls: `(self, path1, path2, recursive=False, maxdepth=None, **kwargs)`
   - subclass: `(self, path1, path2, **kwargs)`
   ```python
       def mv(self, path1, path2, **kwargs):
           return self.fs.mv(
               self._join(path1),
               self._join(path2),
               **kwargs,
           )
   
   ```
   
   ### LocalFileSystem
   `LocalFileSystem.mv` is customized and signature is different
   - base_cls: `(self, path1, path2, recursive=False, maxdepth=None, **kwargs)`
   - subclass: `(self, path1, path2, **kwargs)`
   ```python
       def mv(self, path1, path2, **kwargs):
           path1 = self._strip_protocol(path1)
           path2 = self._strip_protocol(path2)
           shutil.move(path1, path2)
   
   ```
   
   ### FTPFileSystem
   `FTPFileSystem.mv` is customized and signature is different
   - base_cls: `(self, path1, path2, recursive=False, maxdepth=None, **kwargs)`
   - subclass: `(self, path1, path2, **kwargs)`
   ```python
       def mv(self, path1, path2, **kwargs):
           path1 = self._strip_protocol(path1)
           path2 = self._strip_protocol(path2)
           self.ftp.rename(path1, path2)
           self.invalidate_cache(self._parent(path1))
           self.invalidate_cache(self._parent(path2))
   
   ```
   
   ### SFTPFileSystem
   `SFTPFileSystem.mv` is customized and signature is different
   - base_cls: `(self, path1, path2, recursive=False, maxdepth=None, **kwargs)`
   - subclass: `(self, old, new)`
   ```python
       def mv(self, old, new):
           logger.debug("Renaming %s into %s", old, new)
           self.ftp.posix_rename(old, new)
   
   ```
   
   ### SMBFileSystem
   `SMBFileSystem.mv` is customized and signature is different
   - base_cls: `(self, path1, path2, recursive=False, maxdepth=None, **kwargs)`
   - subclass: `(self, path1, path2, recursive=None, maxdepth=None, **kwargs)`
   ```python
       def mv(self, path1, path2, recursive=None, maxdepth=None, **kwargs):
           wpath1 = _as_unc_path(self.host, path1)
           wpath2 = _as_unc_path(self.host, path2)
           smbclient.rename(wpath1, wpath2, port=self._port, **kwargs)
   
   ```
   
   ### WebdavFileSystem
   `WebdavFileSystem.mv` is customized and signature is different
   - base_cls: `(self, path1, path2, recursive=False, maxdepth=None, **kwargs)`
   - subclass: `(self, path1: str, path2: str, recursive: bool = False, 
maxdepth: Optional[bool] = None, **kwargs: Any) -> None`
   ```python
       def mv(
           self,
           path1: str,
           path2: str,
           recursive: bool = False,
           maxdepth: Optional[bool] = None,
           **kwargs: Any,
       ) -> None:
           """Move a file/directory from one path to the other."""
           path1 = self._strip_protocol(path1)
           path2 = self._strip_protocol(path2)
   
           if recursive and not maxdepth and self.isdir(path1):
               return self.client.move(path1, path2)
   
           if not recursive and self.isdir(path1):
               return self.makedirs(path2)
   
           super().mv(path1, path2, recursive=recursive, maxdepth=maxdepth, 
**kwargs)
           return None
   
   ```
   
   ### WebHDFS
   `WebHDFS.mv` is customized and signature is different
   - base_cls: `(self, path1, path2, recursive=False, maxdepth=None, **kwargs)`
   - subclass: `(self, path1, path2, **kwargs)`
   ```python
       def mv(self, path1, path2, **kwargs):
           self._call("RENAME", method="put", path=path1, destination=path2)
   
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to