Script 'mail_helper' called by obssrc Hello community, here is the log from the commit of package python-fsspec for openSUSE:Factory checked in at 2022-04-28 23:07:47 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Comparing /work/SRC/openSUSE:Factory/python-fsspec (Old) and /work/SRC/openSUSE:Factory/.python-fsspec.new.1538 (New) ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Package is "python-fsspec" Thu Apr 28 23:07:47 2022 rev:19 rq:973298 version:2022.3.0 Changes: -------- --- /work/SRC/openSUSE:Factory/python-fsspec/python-fsspec.changes 2022-02-24 18:24:04.398648791 +0100 +++ /work/SRC/openSUSE:Factory/.python-fsspec.new.1538/python-fsspec.changes 2022-04-28 23:07:50.324679041 +0200 @@ -1,0 +2,18 @@ +Mon Apr 4 09:08:29 UTC 2022 - John Paul Adrian Glaubitz <[email protected]> + +- Update to 2022.3.0 + Enhancements + * tqdm example callback with simple methods (#931, 902) + * Allow empty root in get_mapper (#930) + * implement real info for reference FS (#919) + * list known implementations and compressions (#913) + Fixes + * git branch for testing git backend (#929) + * maintaine mem FS's root (#926) + * kargs to FS in parquet module (#921) + * fix on_error in references (#917) + * tar ls consistency (#9114) + * pyarrow: don't decompress twice (#911) + * fix FUSE tests (#905) + +------------------------------------------------------------------- Old: ---- fsspec-2022.02.0.tar.gz New: ---- fsspec-2022.3.0.tar.gz ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Other differences: ------------------ ++++++ python-fsspec.spec ++++++ --- /var/tmp/diff_new_pack.Ft2AVc/_old 2022-04-28 23:07:50.848679613 +0200 +++ /var/tmp/diff_new_pack.Ft2AVc/_new 2022-04-28 23:07:50.852679617 +0200 @@ -26,9 +26,9 @@ %bcond_with test %endif %define skip_python2 1 -%define ghversion 2022.02.0 +%define ghversion 2022.3.0 Name: python-fsspec%{psuffix} -Version: 2022.2.0 +Version: 2022.3.0 Release: 0 Summary: Filesystem specification package License: BSD-3-Clause ++++++ fsspec-2022.02.0.tar.gz -> fsspec-2022.3.0.tar.gz ++++++ diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/filesystem_spec-2022.02.0/README.md new/filesystem_spec-2022.3.0/README.md --- old/filesystem_spec-2022.02.0/README.md 2022-02-22 18:44:54.000000000 +0100 +++ new/filesystem_spec-2022.3.0/README.md 2022-03-31 19:47:04.000000000 +0200 @@ -40,7 +40,7 @@ used to configure a development environment and run tests. First, setup a development conda environment via ``tox -e {env}`` where ``env`` is one of ``{py36,py37,py38,py39}``. -This will install fspec dependencies, test & dev tools, and install fsspec in develop +This will install fsspec dependencies, test & dev tools, and install fsspec in develop mode. You may activate the dev environment under ``.tox/{env}`` via ``conda activate .tox/{env}``. ### Testing diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/filesystem_spec-2022.02.0/docs/source/api.rst new/filesystem_spec-2022.3.0/docs/source/api.rst --- old/filesystem_spec-2022.02.0/docs/source/api.rst 2022-02-22 18:44:54.000000000 +0100 +++ new/filesystem_spec-2022.3.0/docs/source/api.rst 2022-03-31 19:47:04.000000000 +0200 @@ -10,6 +10,8 @@ fsspec.open_files fsspec.open fsspec.open_local + fsspec.available_compressions + fsspec.available_protocols fsspec.filesystem fsspec.get_filesystem_class fsspec.get_mapper @@ -19,6 +21,8 @@ .. autofunction:: fsspec.open_files .. autofunction:: fsspec.open .. autofunction:: fsspec.open_local +.. autofunction:: fsspec.available_compressions +.. autofunction:: fsspec.available_protocols .. autofunction:: fsspec.filesystem .. autofunction:: fsspec.get_filesystem_class .. autofunction:: fsspec.get_mapper @@ -47,6 +51,7 @@ fsspec.callbacks.Callback fsspec.callbacks.NoOpCallback fsspec.callbacks.DotPrinterCallback + fsspec.callbacks.TqdmCallback .. autoclass:: fsspec.spec.AbstractFileSystem :members: @@ -92,6 +97,9 @@ .. autoclass:: fsspec.callbacks.DotPrinterCallback :members: +.. autoclass:: fsspec.callbacks.TqdmCallback + :members: + .. _implementations: Built-in Implementations diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/filesystem_spec-2022.02.0/docs/source/changelog.rst new/filesystem_spec-2022.3.0/docs/source/changelog.rst --- old/filesystem_spec-2022.02.0/docs/source/changelog.rst 2022-02-22 18:44:54.000000000 +0100 +++ new/filesystem_spec-2022.3.0/docs/source/changelog.rst 2022-03-31 19:47:04.000000000 +0200 @@ -1,6 +1,27 @@ Changelog ========= +2022.03.0 +--------- + +Enhancements + +- tqdm example callback with simple methods (#931, 902) +- Allow empty root in get_mapper (#930) +- implement real info for reference FS (#919) +- list known implementations and compressions (#913) + +Fixes + +- git branch for testing git backend (#929) +- maintaine mem FS's root (#926) +- kargs to FS in parquet module (#921) +- fix on_error in references (#917) +- tar ls consistency (#9114) +- pyarrow: don't decompress twice (#911) +- fix FUSE tests (#905) + + 2022.02.0 --------- diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/filesystem_spec-2022.02.0/docs/source/features.rst new/filesystem_spec-2022.3.0/docs/source/features.rst --- old/filesystem_spec-2022.02.0/docs/source/features.rst 2022-02-22 18:44:54.000000000 +0100 +++ new/filesystem_spec-2022.3.0/docs/source/features.rst 2022-03-31 19:47:04.000000000 +0200 @@ -68,8 +68,10 @@ As mentioned above, the ``OpenFile`` class allows for the opening of files on a binary store, which appear to be in text mode and/or allow for a compression/decompression layer between the -caller and the back-end storage system. From the user's point of view, this is achieved simply -by passing arguments to the :func:`fsspec.open_files` or :func:`fsspec.open` functions, and +caller and the back-end storage system. The list of ``fsspec`` supported codec +can be retrieved using :func:`fsspec.available_compressions`. +From the user's point of view, this is achieved simply by passing arguments to +the :func:`fsspec.open_files` or :func:`fsspec.open` functions, and thereafter happens transparently. Key-value stores @@ -397,3 +399,5 @@ backends. See the docstrings in the callbacks module for further details. +``fsspec.callbacks.TqdmCallback`` can be used to display a progress bar using +tqdm. diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/filesystem_spec-2022.02.0/docs/source/usage.rst new/filesystem_spec-2022.3.0/docs/source/usage.rst --- old/filesystem_spec-2022.02.0/docs/source/usage.rst 2022-02-22 18:44:54.000000000 +0100 +++ new/filesystem_spec-2022.3.0/docs/source/usage.rst 2022-03-31 19:47:04.000000000 +0200 @@ -39,6 +39,8 @@ fs = fsspec.filesystem('ftp', host=host, port=port, username=user, password=pw) +The list of implemented ``fsspec`` protocols can be retrieved using :func:`fsspec.available_protocols`. + Use a file-system ----------------- diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/filesystem_spec-2022.02.0/fsspec/__init__.py new/filesystem_spec-2022.3.0/fsspec/__init__.py --- old/filesystem_spec-2022.02.0/fsspec/__init__.py 2022-02-22 18:44:54.000000000 +0100 +++ new/filesystem_spec-2022.3.0/fsspec/__init__.py 2022-03-31 19:47:04.000000000 +0200 @@ -9,10 +9,12 @@ from . import _version, caching from .callbacks import Callback +from .compression import available_compressions from .core import get_fs_token_paths, open, open_files, open_local from .exceptions import FSTimeoutError from .mapping import FSMap, get_mapper from .registry import ( + available_protocols, filesystem, get_filesystem_class, register_implementation, @@ -37,6 +39,8 @@ "registry", "caching", "Callback", + "available_protocols", + "available_compressions", ] diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/filesystem_spec-2022.02.0/fsspec/_version.py new/filesystem_spec-2022.3.0/fsspec/_version.py --- old/filesystem_spec-2022.02.0/fsspec/_version.py 2022-02-22 18:44:54.000000000 +0100 +++ new/filesystem_spec-2022.3.0/fsspec/_version.py 2022-03-31 19:47:04.000000000 +0200 @@ -22,9 +22,9 @@ # setup.py/versioneer.py will grep for the variable names, so they must # each be defined on a line of their own. _version.py will just call # get_keywords(). - git_refnames = " (HEAD -> master, tag: 2022.02.0)" - git_full = "f9089f5ce97e1e52ab70ce1f372fc4c0feed5132" - git_date = "2022-02-22 12:44:54 -0500" + git_refnames = " (tag: 2022.3.0)" + git_full = "a8829696d341e62ca420fcde166434bf10dc68d4" + git_date = "2022-03-31 13:47:04 -0400" keywords = {"refnames": git_refnames, "full": git_full, "date": git_date} return keywords diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/filesystem_spec-2022.02.0/fsspec/asyn.py new/filesystem_spec-2022.3.0/fsspec/asyn.py --- old/filesystem_spec-2022.02.0/fsspec/asyn.py 2022-02-22 18:44:54.000000000 +0100 +++ new/filesystem_spec-2022.3.0/fsspec/asyn.py 2022-03-31 19:47:04.000000000 +0200 @@ -413,7 +413,7 @@ ): # TODO: on_error if max_gap is not None: - # to be implemented in utils + # use utils.merge_offset_ranges raise NotImplementedError if not isinstance(paths, list): raise TypeError diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/filesystem_spec-2022.02.0/fsspec/callbacks.py new/filesystem_spec-2022.3.0/fsspec/callbacks.py --- old/filesystem_spec-2022.02.0/fsspec/callbacks.py 2022-02-22 18:44:54.000000000 +0100 +++ new/filesystem_spec-2022.3.0/fsspec/callbacks.py 2022-03-31 19:47:04.000000000 +0200 @@ -177,4 +177,44 @@ print(self.chr, end="") +class TqdmCallback(Callback): + """ + A callback to display a progress bar using tqdm + + Examples + -------- + >>> import fsspec + >>> from fsspec.callbacks import TqdmCallback + >>> fs = fsspec.filesystem("memory") + >>> path2distant_data = "/your-path" + >>> fs.upload( + ".", + path2distant_data, + recursive=True, + callback=TqdmCallback(), + ) + """ + + def __init__(self, *args, **kwargs): + try: + import tqdm + + self._tqdm = tqdm + except ImportError as exce: + raise ImportError( + "Using TqdmCallback requires tqdm to be installed" + ) from exce + super().__init__(*args, **kwargs) + + def set_size(self, size): + self.tqdm = self._tqdm.tqdm(desc="test", total=size) + + def relative_update(self, inc=1): + self.tqdm.update(inc) + + def __del__(self): + self.tqdm.close() + self.tqdm = None + + _DEFAULT_CALLBACK = NoOpCallback() diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/filesystem_spec-2022.02.0/fsspec/compression.py new/filesystem_spec-2022.3.0/fsspec/compression.py --- old/filesystem_spec-2022.02.0/fsspec/compression.py 2022-02-22 18:44:54.000000000 +0100 +++ new/filesystem_spec-2022.3.0/fsspec/compression.py 2022-03-31 19:47:04.000000000 +0200 @@ -166,3 +166,8 @@ register_compression("zstd", zstandard_file, "zst") except ImportError: pass + + +def available_compressions(): + """Return a list of the implemented compressions.""" + return list(compr) diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/filesystem_spec-2022.02.0/fsspec/conftest.py new/filesystem_spec-2022.3.0/fsspec/conftest.py --- old/filesystem_spec-2022.02.0/fsspec/conftest.py 2022-02-22 18:44:54.000000000 +0100 +++ new/filesystem_spec-2022.3.0/fsspec/conftest.py 2022-03-31 19:47:04.000000000 +0200 @@ -17,7 +17,7 @@ """ m = fsspec.filesystem("memory") m.store.clear() - m.pseudo_dirs.clear() + m.pseudo_dirs = [""] try: yield m finally: diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/filesystem_spec-2022.02.0/fsspec/implementations/arrow.py new/filesystem_spec-2022.3.0/fsspec/implementations/arrow.py --- old/filesystem_spec-2022.02.0/fsspec/implementations/arrow.py 2022-02-22 18:44:54.000000000 +0100 +++ new/filesystem_spec-2022.3.0/fsspec/implementations/arrow.py 2022-03-31 19:47:04.000000000 +0200 @@ -28,6 +28,9 @@ return wrapper +PYARROW_VERSION = None + + class ArrowFSWrapper(AbstractFileSystem): """FSSpec-compatible wrapper of pyarrow.fs.FileSystem. @@ -40,6 +43,10 @@ root_marker = "/" def __init__(self, fs, **kwargs): + from pyarrow import __version__ + + global PYARROW_VERSION + PYARROW_VERSION = tuple(map(int, __version__.split("."))) self.fs = fs super().__init__(**kwargs) @@ -139,12 +146,18 @@ @wrap_exceptions def _open(self, path, mode="rb", block_size=None, **kwargs): if mode == "rb": - stream = self.fs.open_input_stream(path) + method = self.fs.open_input_stream elif mode == "wb": - stream = self.fs.open_output_stream(path) + method = self.fs.open_output_stream else: raise ValueError(f"unsupported mode for Arrow filesystem: {mode!r}") + _kwargs = {} + if PYARROW_VERSION[0] >= 4: + # disable compression auto-detection + _kwargs["compression"] = None + stream = method(path, **_kwargs) + return ArrowFile(self, stream, path, mode, block_size, **kwargs) @wrap_exceptions diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/filesystem_spec-2022.02.0/fsspec/implementations/memory.py new/filesystem_spec-2022.3.0/fsspec/implementations/memory.py --- old/filesystem_spec-2022.02.0/fsspec/implementations/memory.py 2022-02-22 18:44:54.000000000 +0100 +++ new/filesystem_spec-2022.3.0/fsspec/implementations/memory.py 2022-03-31 19:47:04.000000000 +0200 @@ -116,6 +116,9 @@ def rmdir(self, path): path = self._strip_protocol(path) + if path == "": + # silently avoid deleting FS root + return if path in self.pseudo_dirs: if not self.ls(path): self.pseudo_dirs.remove(path) @@ -124,7 +127,7 @@ else: raise FileNotFoundError(path) - def exists(self, path): + def exists(self, path, **kwargs): path = self._strip_protocol(path) return path in self.store or path in self.pseudo_dirs diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/filesystem_spec-2022.02.0/fsspec/implementations/reference.py new/filesystem_spec-2022.3.0/fsspec/implementations/reference.py --- old/filesystem_spec-2022.02.0/fsspec/implementations/reference.py 2022-02-22 18:44:54.000000000 +0100 +++ new/filesystem_spec-2022.3.0/fsspec/implementations/reference.py 2022-03-31 19:47:04.000000000 +0200 @@ -14,13 +14,32 @@ from ..asyn import AsyncFileSystem, sync from ..callbacks import _DEFAULT_CALLBACK -from ..core import filesystem, open -from ..mapping import get_mapper +from ..core import filesystem, open, split_protocol from ..spec import AbstractFileSystem logger = logging.getLogger("fsspec.reference") +def _first(d): + return list(d.values())[0] + + +def _prot_in_references(path, references): + ref = references.get(path) + if isinstance(ref, (list, tuple)): + return split_protocol(ref[0])[0] if ref[0] else ref[0] + + +def _protocol_groups(paths, references): + if isinstance(paths, str): + return {_prot_in_references(paths, references): [paths]} + out = {} + for path in paths: + protocol = _prot_in_references(path, references) + out.setdefault(protocol, []).append(path) + return out + + class ReferenceFileSystem(AsyncFileSystem): """View byte ranges of some other file as a file system @@ -57,7 +76,6 @@ template_overrides=None, simple_templates=True, loop=None, - ref_type=None, **kwargs, ): """ @@ -85,15 +103,16 @@ order. remote_options : dict kwargs to go with remote_protocol - fs : file system instance - Directly provide a file system, if you want to configure it beforehand. This - takes precedence over target_protocol/target_options + fs : AbstractFileSystem | dict(str, (AbstractFileSystem | dict)) + Directly provide a file system(s): + - a single filesystem instance + - a dict of protocol:filesystem, where each value is either a filesystem + instance, or a dict of kwargs that can be used to create in + instance for the given protocol + If this is given, remote_options and remote_protocol are ignored. template_overrides : dict Swap out any templates in the references file with these - useful for testing. - ref_type : "json" | "parquet" | "zarr" - If None, guessed from URL suffix, defaulting to JSON. Ignored if fo - is not a string. simple_templates: bool Whether templates can be processed with simple replace (True) or if jinja is needed (False, much slower). All reference sets produced by @@ -106,6 +125,7 @@ self.template_overrides = template_overrides self.simple_templates = simple_templates self.templates = {} + self.fss = {} if hasattr(fo, "read"): text = fo.read() elif isinstance(fo, str): @@ -114,45 +134,35 @@ else: extra = {} dic = dict(**(ref_storage_args or target_options or {}), **extra) - if ref_type == "zarr" or fo.endswith("zarr"): - import pandas as pd - import zarr - - self.dataframe = True - m = get_mapper(fo, **dic) - z = zarr.open_group(m) - assert z.attrs["version"] == 1 - self.templates = z.attrs["templates"] - self.gen = z.attrs.get("gen", None) - self.df = pd.DataFrame( - {k: z[k][:] for k in ["key", "data", "url", "offset", "size"]} - ).set_index("key") - elif ref_type == "parquet" or fo.endswith("parquet"): - import fastparquet as fp - - self.dataframe = True - with open(fo, "rb", **dic) as f: - pf = fp.ParquetFile(f) - assert pf.key_value_metadata["version"] == 1 - self.templates = json.loads(pf.key_value_metadata["templates"]) - self.gen = json.loads(pf.key_value_metadata.get("gen", "[]")) - self.df = pf.to_pandas(index="key") - else: - # text JSON - with open(fo, "rb", **dic) as f: - logger.info("Read reference from URL %s", fo) - text = f.read() + # text JSON + with open(fo, "rb", **dic) as f: + logger.info("Read reference from URL %s", fo) + text = f.read() else: - # dictionaries; TODO: allow dataframe here? + # dictionaries text = fo if self.dataframe: self._process_dataframe() else: self._process_references(text, template_overrides) - if fs is not None: - self.fs = fs + if isinstance(fs, dict): + self.fss = { + k: ( + fsspec.filesystem(k.split(":", 1)[0], **opts) + if isinstance(opts, dict) + else opts + ) + for k, opts in fs.items() + } return + if fs is not None: + # single remote FS + remote_protocol = ( + fs.protocol[0] if isinstance(fs.protocol, tuple) else fs.protocol + ) + if remote_protocol is None: + # get single protocol from any templates for ref in self.templates.values(): if callable(ref): ref = ref() @@ -161,6 +171,7 @@ remote_protocol = protocol break if remote_protocol is None: + # get single protocol from references for ref in self.references.values(): if callable(ref): ref = ref() @@ -172,11 +183,14 @@ if remote_protocol is None: remote_protocol = target_protocol - self.fs = filesystem(remote_protocol, loop=loop, **(remote_options or {})) + fs = fs or filesystem(remote_protocol, loop=loop, **(remote_options or {})) + self.fss[remote_protocol] = fs + self.fss[None] = fs # default one @property def loop(self): - return self.fs.loop if self.fs.async_impl else self._loop + inloop = [fs.loop for fs in self.fss.values() if fs.async_impl] + return inloop[0] if inloop else self._loop def _cat_common(self, path): path = self._strip_protocol(path) @@ -215,14 +229,21 @@ part_or_url, start0, end0 = self._cat_common(path) if isinstance(part_or_url, bytes): return part_or_url[start:end] - return (await self.fs._cat_file(part_or_url, start=start0, end=end0))[start:end] + protocol, _ = split_protocol(part_or_url) + # TODO: start and end should be passed to cat_file, not sliced + return ( + await self.fss[protocol]._cat_file(part_or_url, start=start0, end=end0) + )[start:end] def cat_file(self, path, start=None, end=None, **kwargs): part_or_url, start0, end0 = self._cat_common(path) if isinstance(part_or_url, bytes): return part_or_url[start:end] - # TODO: update start0, end0 if start/end given, instead of slicing - return self.fs.cat_file(part_or_url, start=start0, end=end0)[start:end] + protocol, _ = split_protocol(part_or_url) + # TODO: start and end should be passed to cat_file, not sliced + return self.fss[protocol].cat_file(part_or_url, start=start0, end=end0)[ + start:end + ] def pipe_file(self, path, value, **_): """Temporarily add binary data or reference as a file""" @@ -245,19 +266,62 @@ callback.absolute_update(len(data)) def get(self, rpath, lpath, recursive=False, **kwargs): - if self.fs.async_impl: - return sync(self.loop, self._get, rpath, lpath, recursive, **kwargs) - return AbstractFileSystem.get(self, rpath, lpath, recursive=recursive, **kwargs) - - def cat(self, path, recursive=False, **kwargs): - if self.fs.async_impl: - return sync(self.loop, self._cat, path, recursive, **kwargs) - elif isinstance(path, list): - if recursive or any("*" in p for p in path): - raise NotImplementedError - return {p: AbstractFileSystem.cat_file(self, p, **kwargs) for p in path} - else: - return AbstractFileSystem.cat_file(self, path) + if isinstance(lpath, list): + # because we have to figure out here which lpath goes with which path + # after grouping + raise NotImplementedError + proto_dict = _protocol_groups(rpath, self.references) + for proto, paths in proto_dict.items(): + if self.fss[proto].async_impl: + sync(self.loop, self._get, paths, lpath, recursive, **kwargs) + else: + AbstractFileSystem.get( + self, paths, lpath, recursive=recursive, **kwargs + ) + + def cat(self, path, recursive=False, on_error="raise", **kwargs): + proto_dict = _protocol_groups(path, self.references) + out = {} + for proto, paths in proto_dict.items(): + if proto is None: + # binary/string + for p in paths: + try: + out[p] = AbstractFileSystem.cat_file(self, p, **kwargs) + except Exception as e: + if on_error == "raise": + raise + if on_error == "return": + out[p] = e + + elif self.fss[proto].async_impl: + # TODO: asyncio.gather on multiple async FSs + out.update( + sync( + self.loop, + self._cat, + paths, + recursive, + on_error=on_error, + **kwargs, + ) + ) + elif isinstance(paths, list): + if recursive or any("*" in p for p in paths): + raise NotImplementedError + for p in paths: + try: + out[p] = AbstractFileSystem.cat_file(self, p, **kwargs) + except Exception as e: + if on_error == "raise": + raise + if on_error == "return": + out[p] = e + else: + out.update(AbstractFileSystem.cat_file(self, paths)) + if len(out) == 1 and isinstance(path, str) and "*" not in path: + return _first(out) + return out def _process_dataframe(self): self._process_templates(self.templates) @@ -462,6 +526,10 @@ out0 = [o for o in out if o["name"] == path] if not out0: return {"name": path, "type": "directory", "size": 0} + if out0[0]["size"] is None: + # if this is a whole remote file, update size using remote FS + prot, _ = split_protocol(self.references[path][0]) + out0[0]["size"] = self.fss[prot].size(self.references[path][0]) return out0[0] async def _info(self, path, **kwargs): # calls fast sync code diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/filesystem_spec-2022.02.0/fsspec/implementations/tar.py new/filesystem_spec-2022.3.0/fsspec/implementations/tar.py --- old/filesystem_spec-2022.02.0/fsspec/implementations/tar.py 2022-02-22 18:44:54.000000000 +0100 +++ new/filesystem_spec-2022.3.0/fsspec/implementations/tar.py 2022-03-31 19:47:04.000000000 +0200 @@ -88,7 +88,7 @@ out = {} for ti in self.tar: info = ti.get_info() - info["type"] = typemap[info["type"]] + info["type"] = typemap.get(info["type"], "file") name = ti.get_info()["name"].rstrip("/") out[name] = (info, ti.offset_data) @@ -96,14 +96,17 @@ # TODO: save index to self.index_store here, if set def _get_dirs(self): - if self.dir_cache is not None: return - self.dir_cache = {} + # This enables ls to get directories as children as well as files + self.dir_cache = { + dirname + "/": {"name": dirname + "/", "size": 0, "type": "directory"} + for dirname in self._all_dirnames(self.tar.getnames()) + } for member in self.tar.getmembers(): info = member.get_info() - info["type"] = typemap[info["type"]] + info["type"] = typemap.get(info["type"], "file") self.dir_cache[info["name"]] = info def _open(self, path, mode="rb", **kwargs): diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/filesystem_spec-2022.02.0/fsspec/implementations/tests/test_archive.py new/filesystem_spec-2022.3.0/fsspec/implementations/tests/test_archive.py --- old/filesystem_spec-2022.02.0/fsspec/implementations/tests/test_archive.py 2022-02-22 18:44:54.000000000 +0100 +++ new/filesystem_spec-2022.3.0/fsspec/implementations/tests/test_archive.py 2022-03-31 19:47:04.000000000 +0200 @@ -249,7 +249,7 @@ def test_mapping(self, scenario: ArchiveTestScenario): with scenario.provider(archive_data) as archive: fs = fsspec.filesystem(scenario.protocol, fo=archive) - m = fs.get_mapper("") + m = fs.get_mapper() assert list(m) == ["a", "b", "deeply/nested/path"] assert m["b"] == archive_data["b"] diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/filesystem_spec-2022.02.0/fsspec/implementations/tests/test_git.py new/filesystem_spec-2022.3.0/fsspec/implementations/tests/test_git.py --- old/filesystem_spec-2022.02.0/fsspec/implementations/tests/test_git.py 2022-02-22 18:44:54.000000000 +0100 +++ new/filesystem_spec-2022.3.0/fsspec/implementations/tests/test_git.py 2022-03-31 19:47:04.000000000 +0200 @@ -17,8 +17,8 @@ d = tempfile.mkdtemp() try: os.chdir(d) - subprocess.call("git init", shell=True, cwd=d) - subprocess.call("git init", shell=True, cwd=d) + subprocess.call("git init -b master", shell=True, cwd=d) + subprocess.call("git init -b master", shell=True, cwd=d) subprocess.call('git config user.email "[email protected]"', shell=True, cwd=d) subprocess.call('git config user.name "Your Name"', shell=True, cwd=d) open(os.path.join(d, "file1"), "wb").write(b"data0") diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/filesystem_spec-2022.02.0/fsspec/implementations/tests/test_memory.py new/filesystem_spec-2022.3.0/fsspec/implementations/tests/test_memory.py --- old/filesystem_spec-2022.02.0/fsspec/implementations/tests/test_memory.py 2022-02-22 18:44:54.000000000 +0100 +++ new/filesystem_spec-2022.3.0/fsspec/implementations/tests/test_memory.py 2022-03-31 19:47:04.000000000 +0200 @@ -9,7 +9,7 @@ files = m.find("") assert files == ["/afiles/and/another", "/somefile"] - files = sorted(m.get_mapper("/")) + files = sorted(m.get_mapper()) assert files == ["afiles/and/another", "somefile"] @@ -150,3 +150,9 @@ f.seek(1) assert f.read(1) == "a" assert f.tell() == 2 + + +def test_remove_all(m): + m.touch("afile") + m.rm("/", recursive=True) + assert not m.ls("/") diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/filesystem_spec-2022.02.0/fsspec/implementations/tests/test_reference.py new/filesystem_spec-2022.3.0/fsspec/implementations/tests/test_reference.py --- old/filesystem_spec-2022.02.0/fsspec/implementations/tests/test_reference.py 2022-02-22 18:44:54.000000000 +0100 +++ new/filesystem_spec-2022.3.0/fsspec/implementations/tests/test_reference.py 2022-03-31 19:47:04.000000000 +0200 @@ -37,6 +37,21 @@ assert fs.find("", withdirs=True) == ["a", "b", "c", "c/d"] +def test_info(server): # noqa: F811 + refs = { + "a": b"data", + "b": (realfile, 0, 5), + "c/d": (realfile, 1, 6), + "e": (realfile,), + } + h = fsspec.filesystem("http", headers={"give_length": "true", "head_ok": "true"}) + fs = fsspec.filesystem("reference", fo=refs, fs=h) + assert fs.size("a") == 4 + assert fs.size("b") == 5 + assert fs.size("c/d") == 6 + assert fs.info("e")["size"] == len(data) + + def test_defaults(server): # noqa: F811 refs = {"a": b"data", "b": (None, 0, 5)} fs = fsspec.filesystem( @@ -231,3 +246,71 @@ fs.get("c", str(tmpdir / "c"), recursive=True) assert (tmpdir / "c").isdir() assert (tmpdir / "c" / "d").read_binary() == b"123456" + + +def test_multi_fs_provided(m, tmpdir): + localfs = LocalFileSystem() + + real = tmpdir / "file" + real.write_binary(b"0123456789") + + m.pipe("afile", b"hello") + + # local URLs are file:// by default + refs = { + "a": b"data", + "b": ("file://" + str(real), 0, 5), + "c/d": ("file://" + str(real), 1, 6), + "c/e": ["memory://afile"], + } + + fs = fsspec.filesystem("reference", fo=refs, fs={"file": localfs, "memory": m}) + assert fs.cat("c/e") == b"hello" + assert fs.cat(["c/e", "a", "b"]) == { + "a": b"data", + "b": b"01234", + "c/e": b"hello", + } + + +def test_multi_fs_created(m, tmpdir): + real = tmpdir / "file" + real.write_binary(b"0123456789") + + m.pipe("afile", b"hello") + + # local URLs are file:// by default + refs = { + "a": b"data", + "b": ("file://" + str(real), 0, 5), + "c/d": ("file://" + str(real), 1, 6), + "c/e": ["memory://afile"], + } + + fs = fsspec.filesystem("reference", fo=refs, fs={"file": {}, "memory": {}}) + assert fs.cat("c/e") == b"hello" + assert fs.cat(["c/e", "a", "b"]) == { + "a": b"data", + "b": b"01234", + "c/e": b"hello", + } + + +def test_missing_nonasync(m): + zarr = pytest.importorskip("zarr") + zarray = { + "chunks": [1], + "compressor": None, + "dtype": "<f8", + "fill_value": "NaN", + "filters": [], + "order": "C", + "shape": [10], + "zarr_format": 2, + } + refs = {".zarray": json.dumps(zarray)} + + m = fsspec.get_mapper("reference://", fo=refs, remote_protocol="memory") + + a = zarr.open_array(m) + assert str(a[0]) == "nan" diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/filesystem_spec-2022.02.0/fsspec/implementations/tests/test_tar.py new/filesystem_spec-2022.3.0/fsspec/implementations/tests/test_tar.py --- old/filesystem_spec-2022.02.0/fsspec/implementations/tests/test_tar.py 2022-02-22 18:44:54.000000000 +0100 +++ new/filesystem_spec-2022.3.0/fsspec/implementations/tests/test_tar.py 2022-03-31 19:47:04.000000000 +0200 @@ -1,12 +1,17 @@ import os import shutil +import tarfile import tempfile +from io import BytesIO +from pathlib import Path +from typing import Dict import pytest import fsspec from fsspec.core import OpenFile from fsspec.implementations.cached import WholeFileCacheFileSystem +from fsspec.implementations.tar import TarFileSystem from fsspec.implementations.tests.test_archive import archive_data, temptar @@ -171,7 +176,6 @@ ids=["tar", "tar-gz", "tar-bz2", "tar-xz"], ) def test_url_to_fs_direct(recipe, tmpdir): - with temptar(archive_data, mode=recipe["mode"], suffix=recipe["suffix"]) as tf: url = f"tar://inner::file://{tf}" fs, url = fsspec.core.url_to_fs(url=url) @@ -189,8 +193,48 @@ ids=["tar", "tar-gz", "tar-bz2", "tar-xz"], ) def test_url_to_fs_cached(recipe, tmpdir): - with temptar(archive_data, mode=recipe["mode"], suffix=recipe["suffix"]) as tf: url = f"tar://inner::simplecache::file://{tf}" fs, url = fsspec.core.url_to_fs(url=url) assert fs.cat("b") == b"hello" + + [email protected]( + "compression", ["", "gz", "bz2", "xz"], ids=["tar", "tar-gz", "tar-bz2", "tar-xz"] +) +def test_ls_with_folders(compression: str, tmp_path: Path): + """ + Create a tar file that doesn't include the intermediate folder structure, + but make sure that the reading filesystem is still able to resolve the + intermediate folders, like the ZipFileSystem. + """ + tar_data: Dict[str, bytes] = { + "a.pdf": b"Hello A!", + "b/c.pdf": b"Hello C!", + "d/e/f.pdf": b"Hello F!", + "d/g.pdf": b"Hello G!", + } + if compression: + temp_archive_file = tmp_path / f"test_tar_file.tar.{compression}" + else: + temp_archive_file = tmp_path / "test_tar_file.tar" + with open(temp_archive_file, "wb") as fd: + # We need to manually write the tarfile here, because temptar + # creates intermediate directories which is not how tars are always created + with tarfile.open(fileobj=fd, mode=f"w:{compression}") as tf: + for tar_file_path, data in tar_data.items(): + content = data + info = tarfile.TarInfo(name=tar_file_path) + info.size = len(content) + tf.addfile(info, BytesIO(content)) + with open(temp_archive_file, "rb") as fd: + fs = TarFileSystem(fd) + assert fs.find("/", withdirs=True) == [ + "a.pdf", + "b/", + "b/c.pdf", + "d/", + "d/e/", + "d/e/f.pdf", + "d/g.pdf", + ] diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/filesystem_spec-2022.02.0/fsspec/mapping.py new/filesystem_spec-2022.3.0/fsspec/mapping.py --- old/filesystem_spec-2022.02.0/fsspec/mapping.py 2022-02-22 18:44:54.000000000 +0100 +++ new/filesystem_spec-2022.3.0/fsspec/mapping.py 2022-03-31 19:47:04.000000000 +0200 @@ -187,7 +187,7 @@ def get_mapper( - url, + url="", check=False, create=False, missing_exceptions=None, diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/filesystem_spec-2022.02.0/fsspec/parquet.py new/filesystem_spec-2022.3.0/fsspec/parquet.py --- old/filesystem_spec-2022.02.0/fsspec/parquet.py 2022-02-22 18:44:54.000000000 +0100 +++ new/filesystem_spec-2022.3.0/fsspec/parquet.py 2022-03-31 19:47:04.000000000 +0200 @@ -96,7 +96,7 @@ # Make sure we have an `AbstractFileSystem` object # to work with if fs is None: - fs = url_to_fs(path, storage_options=(storage_options or {}))[0] + fs = url_to_fs(path, **(storage_options or {}))[0] # For now, `columns == []` not supported. Just use # default `open` command with `path` input diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/filesystem_spec-2022.02.0/fsspec/registry.py new/filesystem_spec-2022.3.0/fsspec/registry.py --- old/filesystem_spec-2022.02.0/fsspec/registry.py 2022-02-22 18:44:54.000000000 +0100 +++ new/filesystem_spec-2022.3.0/fsspec/registry.py 2022-03-31 19:47:04.000000000 +0200 @@ -251,3 +251,11 @@ """ cls = get_filesystem_class(protocol) return cls(**storage_options) + + +def available_protocols(): + """Return a list of the implemented protocols. + + Note that any given protocol may require extra packages to be importable. + """ + return list(known_implementations) diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/filesystem_spec-2022.02.0/fsspec/spec.py new/filesystem_spec-2022.3.0/fsspec/spec.py --- old/filesystem_spec-2022.02.0/fsspec/spec.py 2022-02-22 18:44:54.000000000 +0100 +++ new/filesystem_spec-2022.3.0/fsspec/spec.py 2022-03-31 19:47:04.000000000 +0200 @@ -1153,7 +1153,7 @@ # all instances already also derive from pyarrow return self - def get_mapper(self, root, check=False, create=False): + def get_mapper(self, root="", check=False, create=False): """Create key/value store based on this file-system Makes a MutableMapping interface to the FS at the given root path. diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/filesystem_spec-2022.02.0/fsspec/tests/test_mapping.py new/filesystem_spec-2022.3.0/fsspec/tests/test_mapping.py --- old/filesystem_spec-2022.02.0/fsspec/tests/test_mapping.py 2022-02-22 18:44:54.000000000 +0100 +++ new/filesystem_spec-2022.3.0/fsspec/tests/test_mapping.py 2022-03-31 19:47:04.000000000 +0200 @@ -5,6 +5,7 @@ import pytest import fsspec +from fsspec.implementations.local import LocalFileSystem from fsspec.implementations.memory import MemoryFileSystem @@ -143,3 +144,8 @@ dtype="<m8[ns]", ) # timedelta64 scalar assert m["c"] == b',M"\x9e\xc6\x99A\x065\x1c\xf0Rn4\xcb+' + + +def test_empty_url(): + m = fsspec.get_mapper() + assert isinstance(m.fs, LocalFileSystem) diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/filesystem_spec-2022.02.0/setup.py new/filesystem_spec-2022.3.0/setup.py --- old/filesystem_spec-2022.02.0/setup.py 2022-02-22 18:44:54.000000000 +0100 +++ new/filesystem_spec-2022.3.0/setup.py 2022-03-31 19:47:04.000000000 +0200 @@ -21,6 +21,7 @@ "Programming Language :: Python :: 3.7", "Programming Language :: Python :: 3.8", "Programming Language :: Python :: 3.9", + "Programming Language :: Python :: 3.10", ], description="File-system specification", long_description=long_description, @@ -54,6 +55,7 @@ "fuse": ["fusepy"], "libarchive": ["libarchive-c"], "gui": ["panel"], + "tqdm": ["tqdm"], }, zip_safe=False, )
