Hello, This goes out to the good fellas who care about scikit-learn. There is tutorial for the qiime package that has classifier prepared that only works with the latest stable version (0.24.2). We are at 0.23.2 in Debian.
I gave an update of Scikit-Learn a shot and while the main build was fine, I was eventually greeted with many download errors for datasets that it uses as examples, as in /home/steffen/Science/scikit-learn/examples/inspection/plot_partial_dependence.py failed leaving traceback: Traceback (most recent call last): File "/home/steffen/Science/scikit-learn/examples/inspection/plot_partial_dependence.py", line 50, in <module> cal_housing = fetch_california_housing() File "/home/steffen/Science/scikit-learn/.pybuild/cpython3_3.9/build/sklearn/utils/validation.py", line 63, in inner_f return f(*args, **kwargs) File "/home/steffen/Science/scikit-learn/.pybuild/cpython3_3.9/build/sklearn/datasets/_california_housing.py", line 134, in fetch_california_housing archive_path = _fetch_remote(ARCHIVE, dirname=data_home) File "/home/steffen/Science/scikit-learn/.pybuild/cpython3_3.9/build/sklearn/datasets/_base.py", line 1194, in _fetch_remote urlretrieve(remote.url, file_path) File "/usr/lib/python3.9/urllib/request.py", line 239, in urlretrieve with contextlib.closing(urlopen(url, data)) as fp: File "/usr/lib/python3.9/urllib/request.py", line 214, in urlopen return opener.open(url, data, timeout) File "/usr/lib/python3.9/urllib/request.py", line 517, in open response = self._open(req, data) File "/usr/lib/python3.9/urllib/request.py", line 534, in _open result = self._call_chain(self.handle_open, protocol, protocol + File "/usr/lib/python3.9/urllib/request.py", line 494, in _call_chain result = func(*args) File "/usr/lib/python3.9/urllib/request.py", line 1389, in https_open return self.do_open(http.client.HTTPSConnection, req, File "/usr/lib/python3.9/urllib/request.py", line 1349, in do_open raise URLError(err) urllib.error.URLError: <urlopen error [Errno -2] Name or service not known> The original data is typically available. But these are classics with often unclear licenses, here as in # The original data can be found at: # https://www.dcc.fc.up.pt/~ltorgo/Regression/cal_housing.tgz ARCHIVE = RemoteFileMetadata( filename='cal_housing.tgz', url='https://ndownloader.figshare.com/files/5976036', checksum=('aaa5c9a6afe2225cc2aed2723682ae40' '3280c4a3695a2ddda4ffb5d8215ea681')) The build currently ends with I: pybuild pybuild:284: (mv /home/steffen/Science/scikit-learn/sklearn/conftest.py /home/steffen/Science/scikit-learn/sklearn/conftest.py.test; mv /home/steffen/Science/scikit-learn/sklearn/datasets/tests/conftest.py /home/steffen/Science/scikit-learn/sklearn/datasets/tests/conftest.py.test; cd /home/steffen/Science/scikit-learn/.pybuild/cpython3_3.9/build && python3.9 -c 'import sklearn; sklearn.show_versions()') mv: cannot stat '/home/steffen/Science/scikit-learn/sklearn/conftest.py': No such file or directory mv: cannot stat '/home/steffen/Science/scikit-learn/sklearn/datasets/tests/conftest.py': No such file or directory System: python: 3.9.7 (default, Sep 3 2021, 06:18:44) [GCC 10.3.0] executable: /usr/bin/python3.9 machine: Linux-5.10.0-8-amd64-x86_64-with-glibc2.32 Python dependencies: pip: 20.3.4 setuptools: 52.0.0 sklearn: 0.24.2 numpy: 1.19.5 scipy: 1.7.1 Cython: 0.29.21 pandas: 1.1.5 matplotlib: 3.3.4 joblib: 0.17.0 threadpoolctl: 2.1.0 Built with OpenMP: True I: pybuild base:232: cd /home/steffen/Science/scikit-learn/.pybuild/cpython3_3.9/build; python3.9 -m pytest -m "not network" -v -k "not test_old_pickle and not test_ard_accuracy_on_easy_problem" ImportError while loading conftest '/home/steffen/Science/scikit-learn/conftest.py'. ../../../conftest.py:14: in <module> from sklearn.utils import _IS_32BIT ../../../sklearn/__init__.py:81: in <module> from . import __check_build # noqa: F401 ../../../sklearn/__check_build/__init__.py:46: in <module> raise_build_error(e) ../../../sklearn/__check_build/__init__.py:31: in raise_build_error raise ImportError("""%s E ImportError: No module named 'sklearn.__check_build._check_build' E ___________________________________________________________________________ E Contents of /home/steffen/Science/scikit-learn/sklearn/__check_build: E setup.py _check_build.c __pycache__ E _check_build.pyx __init__.py E ___________________________________________________________________________ E It seems that scikit-learn has not been built correctly. E E If you have installed scikit-learn from source, please do not forget E to build the package before using it: run `python setup.py install` or E `make` in the source directory. E E If you have used an installer, please check that it is suited for your E Python version, your operating system and your platform. E: pybuild pybuild:353: test: plugin distutils failed with: exit code=4: cd /home/steffen/Science/scikit-learn/.pybuild/cpython3_3.9/build; python3.9 -m pytest -m "not network" -v -k "not test_old_pickle and not test_ard_accuracy_on_easy_problem" What are you all thinking? Should we prepare separate data packages for Debian for those data sets that allow an unrestricted distribution? I did not have a closer look, but I expect these datasets to total to about 100MB with no frequent updates expected if any. Or should we not build these jupyter notebooks for the -doc package? Many thanks Steffen