ap-- commented on PR #41921:
URL: https://github.com/apache/airflow/pull/41921#issuecomment-2323438156
Regarding the `overwrite` argument in `rename`: The implementation of
`UPath.rename` relies on `AbstractFileSystem.mv`. Below is a report for a
selection of available fsspec filesystems, showing if they rely on the default
implementation or if they override the method.
**tldr:** the default and most custom implementations throw away extra
keyword arguments. The only one in this list that doesn't is `SMBFileSystem`,
but it seems there's no overwrite keyword argument further down in
`smbclient.rename` either. So unless you use a fsspec filesystem that is not
listed here and requires the `overwrite` kwarg, it should be safe to remove.
See below for details on the different rename implementations:
# fsspec subclasses: customized method report for 'mv'
<details>
<summary>Click to see the environment used to generate the report</summary>
```
adlfs==2024.7.0
aenum==3.1.15
aiobotocore==2.14.0
aiofile==3.8.8
aiohappyeyeballs==2.4.0
aiohttp==3.10.5
aiohttp-retry==2.8.3
aioitertools==0.11.0
aiooss2==0.2.10
aiosignal==1.3.1
aliyun-python-sdk-core==2.15.2
aliyun-python-sdk-kms==2.16.5
amqp==5.2.0
annotated-types==0.7.0
antlr4-python3-runtime==4.9.3
anyio==4.4.0
appdirs==1.4.4
asyncssh==2.16.0
atpublic==5.0
attrs==24.2.0
azure-core==1.30.2
azure-datalake-store==0.0.53
azure-identity==1.17.1
azure-storage-blob==12.22.0
bcrypt==4.2.0
billiard==4.2.0
botocore==1.35.7
boxfs==0.3.0
boxsdk==3.13.0
cachetools==5.5.0
caio==0.9.17
celery==5.4.0
certifi==2024.8.30
cffi==1.17.0
charset-normalizer==3.3.2
circuitbreaker==2.0.0
click==8.1.7
click-didyoumean==0.3.1
click-plugins==1.1.1
click-repl==0.3.0
cloudpickle==3.0.0
colorama==0.4.6
configobj==5.0.8
crcmod==1.7
cryptography==42.0.8
dask==2024.8.2
decorator==5.1.1
dictdiffer==0.9.0
diskcache==5.6.3
distributed==2024.8.2
distro==1.9.0
docker-pycreds==0.4.0
dpath==2.2.0
dropbox==12.0.2
dropboxdrivefs==1.4.1
dulwich==0.22.1
dvc==3.53.1
dvc-data==3.15.2
dvc-http==2.32.0
dvc-objects==5.1.0
dvc-render==1.0.2
dvc-studio-client==0.21.0
dvc-task==0.4.0
entrypoints==0.4
filelock==3.15.4
flatten-dict==0.4.2
flufl.lock==8.1.0
frozenlist==1.4.1
fsspec==2024.6.1
fsspec_xrootd==0.3.0
funcy==2.0
gcsfs==2024.6.1
gitdb==4.0.11
GitPython==3.1.43
google-api-core==2.19.2
google-auth==2.34.0
google-auth-oauthlib==1.2.1
google-cloud-core==2.4.1
google-cloud-storage==2.18.2
google-crc32c==1.5.0
google-resumable-media==2.7.2
googleapis-common-protos==1.65.0
grandalf==0.8
gto==1.7.1
h11==0.14.0
httpcore==1.0.5
httpx==0.27.2
huggingface-hub==0.23.5
hydra-core==1.3.2
idna==3.8
isodate==0.6.1
iterative-telemetry==0.0.8
Jinja2==3.1.4
jmespath==0.10.0
kombu==5.4.0
lakefs==0.7.1
lakefs-sdk==1.32.1
lakefs-spec==0.10.0
locket==1.0.0
markdown-it-py==3.0.0
MarkupSafe==2.1.5
mdurl==0.1.2
morefs==0.2.2
msal==1.30.0
msal-extensions==1.2.0
msgpack==1.0.8
multidict==6.0.5
networkx==3.3
numpy==2.1.0
oauthlib==3.2.2
oci==2.133.0
ocifs==1.3.1
omegaconf==2.3.0
orjson==3.10.7
oss2==2.18.1
ossfs==2023.12.0
packaging==24.1
paramiko==3.4.1
partd==1.4.2
pathspec==0.12.1
platformdirs==3.11.0
ply==3.11
portalocker==2.10.1
prompt_toolkit==3.0.47
proto-plus==1.24.0
protobuf==5.28.0
psutil==6.0.0
pyarrow==17.0.0
pyasn1==0.6.0
pyasn1_modules==0.4.0
pycparser==2.22
pycryptodome==3.20.0
pydantic==2.8.2
pydantic_core==2.20.1
pydot==3.0.1
pygit2==1.15.1
Pygments==2.18.0
pygtrie==2.5.0
PyJWT==2.9.0
PyNaCl==1.5.0
pyOpenSSL==24.2.1
pyparsing==3.1.4
pyspnego==0.11.1
python-dateutil==2.9.0.post0
pytz==2024.1
PyYAML==6.0.2
requests==2.32.3
requests-oauthlib==2.0.0
requests-toolbelt==1.0.0
rich==13.8.0
rsa==4.9
ruamel.yaml==0.18.6
ruamel.yaml.clib==0.2.8
s3fs==2024.6.1
scmrepo==3.3.7
semver==3.0.2
sentry-sdk==2.13.0
setproctitle==1.3.3
setuptools==74.0.0
shellingham==1.5.4
shortuuid==1.0.13
shtab==1.7.1
six==1.16.0
smbprotocol==1.14.0
smmap==5.0.1
sniffio==1.3.1
sortedcontainers==2.4.0
sqltrie==0.11.1
stone==3.3.1
tabulate==0.9.0
tblib==3.0.0
tomlkit==0.13.2
toolz==0.12.1
tornado==6.4.1
tqdm==4.66.5
typer==0.12.5
typing_extensions==4.12.2
tzdata==2024.1
urllib3==2.0.7
vine==5.1.0
voluptuous==0.15.2
wandb==0.17.8
wandbfs==0.0.2
wcwidth==0.2.13
webdav4==0.10.0
wrapt==1.16.0
yarl==1.9.6
zc.lockfile==3.0.post1
zict==3.0.0
```
</details>
## Default implementation
```python
def mv(self, path1, path2, recursive=False, maxdepth=None, **kwargs):
"""Move file(s) from one location to another"""
if path1 == path2:
logger.debug("%s mv: The paths are the same, so no files were
moved.", self)
else:
# explicitly raise exception to prevent data corruption
self.copy(
path1, path2, recursive=recursive, maxdepth=maxdepth,
onerror="raise"
)
self.rm(path1, recursive=recursive)
```
These filesystem classes do not customize the method 'mv':
- `AzureBlobFileSystem`
- `AzureDatalakeFileSystem`
- `CachingFileSystem`
- `BoxFileSystem`
- `DataFileSystem`
- `DropboxDriveFileSystem`
- `_DVCFileSystem`
- `WholeFileCacheFileSystem`
- `GCSFileSystem`
- `GenericFileSystem`
- `GitFileSystem`
- `GithubFileSystem`
- `HfFileSystem`
- `HTTPFileSystem`
- `JupyterFileSystem`
- `LakeFSFileSystem`
- `LibArchiveFileSystem`
- `MemoryFileSystem`
- `OCIFileSystem`
- `OSSFileSystem`
- `ReferenceFileSystem`
- `XRootDFileSystem`
- `S3FileSystem`
- `SimpleCacheFileSystem`
- `TarFileSystem`
- `WandbFS`
- `ZipFileSystem`
- `DictFS`
- `MemFS`
- `OverlayFileSystem`
## Subclasses customizing 'mv'
### HadoopFileSystem
`HadoopFileSystem.mv` is customized and signature is different
- base_cls: `(self, path1, path2, recursive=False, maxdepth=None, **kwargs)`
- subclass: `(self, path1, path2, **kwargs)`
```python
@wrap_exceptions
def mv(self, path1, path2, **kwargs):
path1 = self._strip_protocol(path1).rstrip("/")
path2 = self._strip_protocol(path2).rstrip("/")
self.fs.move(path1, path2)
```
### AsyncLocalFileSystem
`AsyncLocalFileSystem.mv` is customized and signature is different
- base_cls: `(self, path1, path2, recursive=False, maxdepth=None, **kwargs)`
- subclass: `(self, path1, path2, **kwargs)`
```python
def mv(self, path1, path2, **kwargs):
path1 = self._strip_protocol(path1)
path2 = self._strip_protocol(path2)
shutil.move(path1, path2)
```
### DaskWorkerFileSystem
`DaskWorkerFileSystem.mv` is customized and signature is different
- base_cls: `(self, path1, path2, recursive=False, maxdepth=None, **kwargs)`
- subclass: `(self, *args, **kwargs)`
```python
def mv(self, *args, **kwargs):
if self.worker:
self.fs.mv(*args, **kwargs)
else:
self.rfs.mv(*args, **kwargs).compute()
```
### DatabricksFileSystem
`DatabricksFileSystem.mv` is customized and signature is different
- base_cls: `(self, path1, path2, recursive=False, maxdepth=None, **kwargs)`
- subclass: `(self, source_path, destination_path, recursive=False,
maxdepth=None, **kwargs)`
```python
def mv(
self, source_path, destination_path, recursive=False, maxdepth=None,
**kwargs
):
"""
Move a source to a destination path.
A note from the original [databricks API manual]
(https://docs.databricks.com/dev-tools/api/latest/dbfs.html#move).
When moving a large number of files the API call will time out after
approximately 60s, potentially resulting in partially moved data.
Therefore, for operations that move more than 10k files, we strongly
discourage using the DBFS REST API.
Parameters
----------
source_path: str
From where to move (absolute path)
destination_path: str
To where to move (absolute path)
recursive: bool
Not implemented to far.
maxdepth:
Not implemented to far.
"""
if recursive:
raise NotImplementedError
if maxdepth:
raise NotImplementedError
try:
self._send_to_api(
method="post",
endpoint="move",
json={"source_path": source_path, "destination_path":
destination_path},
)
except DatabricksException as e:
if e.error_code == "RESOURCE_DOES_NOT_EXIST":
raise FileNotFoundError(e.message)
elif e.error_code == "RESOURCE_ALREADY_EXISTS":
raise FileExistsError(e.message)
raise e
self.invalidate_cache(self._parent(source_path))
self.invalidate_cache(self._parent(destination_path))
```
### DirFileSystem
`DirFileSystem.mv` is customized and signature is different
- base_cls: `(self, path1, path2, recursive=False, maxdepth=None, **kwargs)`
- subclass: `(self, path1, path2, **kwargs)`
```python
def mv(self, path1, path2, **kwargs):
return self.fs.mv(
self._join(path1),
self._join(path2),
**kwargs,
)
```
### LocalFileSystem
`LocalFileSystem.mv` is customized and signature is different
- base_cls: `(self, path1, path2, recursive=False, maxdepth=None, **kwargs)`
- subclass: `(self, path1, path2, **kwargs)`
```python
def mv(self, path1, path2, **kwargs):
path1 = self._strip_protocol(path1)
path2 = self._strip_protocol(path2)
shutil.move(path1, path2)
```
### FTPFileSystem
`FTPFileSystem.mv` is customized and signature is different
- base_cls: `(self, path1, path2, recursive=False, maxdepth=None, **kwargs)`
- subclass: `(self, path1, path2, **kwargs)`
```python
def mv(self, path1, path2, **kwargs):
path1 = self._strip_protocol(path1)
path2 = self._strip_protocol(path2)
self.ftp.rename(path1, path2)
self.invalidate_cache(self._parent(path1))
self.invalidate_cache(self._parent(path2))
```
### SFTPFileSystem
`SFTPFileSystem.mv` is customized and signature is different
- base_cls: `(self, path1, path2, recursive=False, maxdepth=None, **kwargs)`
- subclass: `(self, old, new)`
```python
def mv(self, old, new):
logger.debug("Renaming %s into %s", old, new)
self.ftp.posix_rename(old, new)
```
### SMBFileSystem
`SMBFileSystem.mv` is customized and signature is different
- base_cls: `(self, path1, path2, recursive=False, maxdepth=None, **kwargs)`
- subclass: `(self, path1, path2, recursive=None, maxdepth=None, **kwargs)`
```python
def mv(self, path1, path2, recursive=None, maxdepth=None, **kwargs):
wpath1 = _as_unc_path(self.host, path1)
wpath2 = _as_unc_path(self.host, path2)
smbclient.rename(wpath1, wpath2, port=self._port, **kwargs)
```
### WebdavFileSystem
`WebdavFileSystem.mv` is customized and signature is different
- base_cls: `(self, path1, path2, recursive=False, maxdepth=None, **kwargs)`
- subclass: `(self, path1: str, path2: str, recursive: bool = False,
maxdepth: Optional[bool] = None, **kwargs: Any) -> None`
```python
def mv(
self,
path1: str,
path2: str,
recursive: bool = False,
maxdepth: Optional[bool] = None,
**kwargs: Any,
) -> None:
"""Move a file/directory from one path to the other."""
path1 = self._strip_protocol(path1)
path2 = self._strip_protocol(path2)
if recursive and not maxdepth and self.isdir(path1):
return self.client.move(path1, path2)
if not recursive and self.isdir(path1):
return self.makedirs(path2)
super().mv(path1, path2, recursive=recursive, maxdepth=maxdepth,
**kwargs)
return None
```
### WebHDFS
`WebHDFS.mv` is customized and signature is different
- base_cls: `(self, path1, path2, recursive=False, maxdepth=None, **kwargs)`
- subclass: `(self, path1, path2, **kwargs)`
```python
def mv(self, path1, path2, **kwargs):
self._call("RENAME", method="put", path=path1, destination=path2)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]