Package: release.debian.org Severity: normal Tags: buster User: [email protected] Usertags: pu
The version of "internetarchive" in Debian has some serious scalability and reliability issues. In particular, it has trouble handling more than 1024 files (#950289) but upstream also made a few other changes which we might want to merge in. We're shipping a modified version of 1.8.1 in buster, and upstream has released a few releases up to 1.8.5 and 1.9.0 since then, the latter of which is in Debian. I just uploaded a patch for #950289 to unstable. I'm hoping this patch could also be shipped to stable, but I am wondering if we wouldn't be better off syncing the entire package with upstream, to 1.8.5 or maybe even what will become 1.9.1 once they ship the fix for #950289 upstream. In the meantime, I'm including the debdiff for a hotfix on 1.8.1. I also include a debdiff to upgrade to 1.8.5 *and* the hotfix, in case that would also be acceptable. Finally, I'd be happy to wait a little longer and coordinate with upstream to get 1.9.1 synchronized in all suites. Thanks for your work! a. -- System Information: Debian Release: 10.2 APT prefers stable-debug APT policy: (500, 'stable-debug'), (500, 'stable'), (1, 'experimental'), (1, 'unstable') Architecture: amd64 (x86_64) Kernel: Linux 4.19.0-6-amd64 (SMP w/4 CPU cores) Locale: LANG=fr_CA.UTF-8, LC_CTYPE=fr_CA.UTF-8 (charmap=UTF-8), LANGUAGE=fr_CA.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /usr/bin/dash Init: systemd (via /run/systemd/system) LSM: AppArmor: enabled
diff -Nru python-internetarchive-1.8.1/debian/changelog python-internetarchive-1.8.1/debian/changelog --- python-internetarchive-1.8.1/debian/changelog 2018-09-24 23:08:05.000000000 -0400 +++ python-internetarchive-1.8.1/debian/changelog 2020-01-31 15:00:57.000000000 -0500 @@ -1,3 +1,9 @@ +python-internetarchive (1.8.1-1+deb10u1) buster; urgency=medium + + * hotfix: close file after getting md5 (Closes: #950289) + + -- Antoine Beaupré <[email protected]> Fri, 31 Jan 2020 15:00:57 -0500 + python-internetarchive (1.8.1-1) unstable; urgency=low * Package internetarchive library for Debian (Closes: #909550) diff -Nru python-internetarchive-1.8.1/debian/patches/0001-close-file-after-getting-md5.patch python-internetarchive-1.8.1/debian/patches/0001-close-file-after-getting-md5.patch --- python-internetarchive-1.8.1/debian/patches/0001-close-file-after-getting-md5.patch 1969-12-31 19:00:00.000000000 -0500 +++ python-internetarchive-1.8.1/debian/patches/0001-close-file-after-getting-md5.patch 2020-01-31 15:00:57.000000000 -0500 @@ -0,0 +1,55 @@ +From 086e2e65fc840fd827b02e1022fad084ee700d7c Mon Sep 17 00:00:00 2001 +From: kpcyrd <[email protected]> +Date: Fri, 31 Jan 2020 14:53:05 -0500 +Subject: [PATCH] close file after getting md5 + +I've tried to upload to archive.org and noticed ia crashes on +large folders. + + $ ulimit -n + 1024 + $ ia upload asdf ./folder-with-more-than-1024-files/ + [...] + OSError: [Errno 24] Too many open files + [...] + $ + +The bug is present in src:python-internetarchive, I found a patch that +resolves the issue from 2018 that was never applied. You can find a +patch that cleanly applies to the current debian/sid below. The original +author is github.com/Arkiver2. + +Upstream patch: +https://github.com/jjjake/internetarchive/commit/4e4120f07c98ea98c61791293835df2797bfee61 + +Debian Bug: #950289 +--- + internetarchive/utils.py | 6 ++++-- + 1 file changed, 4 insertions(+), 2 deletions(-) + +diff --git a/internetarchive/utils.py b/internetarchive/utils.py +index db8412a..2f3e04e 100644 +--- a/internetarchive/utils.py ++++ b/internetarchive/utils.py +@@ -235,14 +235,16 @@ def recursive_file_count(files, item=None, checksum=False): + is_dir = False + if is_dir: + for x, _ in iter_directory(f): +- lmd5 = get_md5(open(x, 'rb')) ++ with open(x, 'rb') as f_: ++ lmd5 = get_md5(f_) + if lmd5 in md5s: + continue + else: + total_files += 1 + else: + try: +- lmd5 = get_md5(open(f, 'rb')) ++ with open(f, 'rb') as f_: ++ lmd5 = get_md5(f_) + except TypeError: + # Support file-like objects. + lmd5 = get_md5(f) +-- +2.20.1 + diff -Nru python-internetarchive-1.8.1/debian/patches/series python-internetarchive-1.8.1/debian/patches/series --- python-internetarchive-1.8.1/debian/patches/series 2018-09-24 23:08:05.000000000 -0400 +++ python-internetarchive-1.8.1/debian/patches/series 2020-01-31 15:00:57.000000000 -0500 @@ -1 +1,2 @@ 0001-v1.8.1.patch +0001-close-file-after-getting-md5.patch
diff -Nru python-internetarchive-1.8.1/debian/changelog python-internetarchive-1.8.5/debian/changelog --- python-internetarchive-1.8.1/debian/changelog 2018-09-24 23:08:05.000000000 -0400 +++ python-internetarchive-1.8.5/debian/changelog 2020-01-31 15:00:57.000000000 -0500 @@ -1,3 +1,20 @@ +python-internetarchive (1.8.5-1+deb10u1) buster; urgency=medium + + * hotfix: close file after getting md5 (Closes: #950289) + + -- Antoine Beaupré <[email protected]> Fri, 31 Jan 2020 15:00:57 -0500 + +python-internetarchive (1.8.5-1) unstable; urgency=medium + + [ Ondřej Nový ] + * Use debhelper-compat instead of debian/compat. + + [ Antoine Beaupré] + * new upstream release (Closes: #922357) + * remove patches merged upstream + + -- Antoine Beaupré <[email protected]> Tue, 15 Oct 2019 20:29:12 -0400 + python-internetarchive (1.8.1-1) unstable; urgency=low * Package internetarchive library for Debian (Closes: #909550) diff -Nru python-internetarchive-1.8.1/debian/compat python-internetarchive-1.8.5/debian/compat --- python-internetarchive-1.8.1/debian/compat 2018-09-24 23:08:05.000000000 -0400 +++ python-internetarchive-1.8.5/debian/compat 1969-12-31 19:00:00.000000000 -0500 @@ -1 +0,0 @@ -11 diff -Nru python-internetarchive-1.8.1/debian/control python-internetarchive-1.8.5/debian/control --- python-internetarchive-1.8.1/debian/control 2018-09-24 23:08:05.000000000 -0400 +++ python-internetarchive-1.8.5/debian/control 2020-01-31 15:00:57.000000000 -0500 @@ -2,7 +2,7 @@ Section: python Priority: optional Maintainer: Antoine Beaupré <[email protected]> -Build-Depends: debhelper (>= 11~), dh-python, +Build-Depends: debhelper-compat (= 11), dh-python, python3-all, python3-clint, python3-docopt, diff -Nru python-internetarchive-1.8.1/debian/patches/0001-close-file-after-getting-md5.patch python-internetarchive-1.8.5/debian/patches/0001-close-file-after-getting-md5.patch --- python-internetarchive-1.8.1/debian/patches/0001-close-file-after-getting-md5.patch 1969-12-31 19:00:00.000000000 -0500 +++ python-internetarchive-1.8.5/debian/patches/0001-close-file-after-getting-md5.patch 2020-01-31 15:00:57.000000000 -0500 @@ -0,0 +1,55 @@ +From 086e2e65fc840fd827b02e1022fad084ee700d7c Mon Sep 17 00:00:00 2001 +From: kpcyrd <[email protected]> +Date: Fri, 31 Jan 2020 14:53:05 -0500 +Subject: [PATCH] close file after getting md5 + +I've tried to upload to archive.org and noticed ia crashes on +large folders. + + $ ulimit -n + 1024 + $ ia upload asdf ./folder-with-more-than-1024-files/ + [...] + OSError: [Errno 24] Too many open files + [...] + $ + +The bug is present in src:python-internetarchive, I found a patch that +resolves the issue from 2018 that was never applied. You can find a +patch that cleanly applies to the current debian/sid below. The original +author is github.com/Arkiver2. + +Upstream patch: +https://github.com/jjjake/internetarchive/commit/4e4120f07c98ea98c61791293835df2797bfee61 + +Debian Bug: #950289 +--- + internetarchive/utils.py | 6 ++++-- + 1 file changed, 4 insertions(+), 2 deletions(-) + +diff --git a/internetarchive/utils.py b/internetarchive/utils.py +index db8412a..2f3e04e 100644 +--- a/internetarchive/utils.py ++++ b/internetarchive/utils.py +@@ -235,14 +235,16 @@ def recursive_file_count(files, item=None, checksum=False): + is_dir = False + if is_dir: + for x, _ in iter_directory(f): +- lmd5 = get_md5(open(x, 'rb')) ++ with open(x, 'rb') as f_: ++ lmd5 = get_md5(f_) + if lmd5 in md5s: + continue + else: + total_files += 1 + else: + try: +- lmd5 = get_md5(open(f, 'rb')) ++ with open(f, 'rb') as f_: ++ lmd5 = get_md5(f_) + except TypeError: + # Support file-like objects. + lmd5 = get_md5(f) +-- +2.20.1 + diff -Nru python-internetarchive-1.8.1/debian/patches/0001-v1.8.1.patch python-internetarchive-1.8.5/debian/patches/0001-v1.8.1.patch --- python-internetarchive-1.8.1/debian/patches/0001-v1.8.1.patch 2018-09-24 23:08:05.000000000 -0400 +++ python-internetarchive-1.8.5/debian/patches/0001-v1.8.1.patch 1969-12-31 19:00:00.000000000 -0500 @@ -1,46 +0,0 @@ -Forwarded: https://github.com/jjjake/internetarchive/issues/271 -Origin: upstream -From eb4d1d7821b20368ac3e43836062a74ad960baf9 Mon Sep 17 00:00:00 2001 -From: jake <[email protected]> -Date: Mon, 2 Jul 2018 11:01:11 -0700 -Subject: [PATCH] v1.8.1 - ---- - HISTORY.rst | 7 +++++++ - internetarchive/__init__.py | 2 +- - 2 files changed, 8 insertions(+), 1 deletion(-) - -diff --git a/HISTORY.rst b/HISTORY.rst -index c064b68..415ceb1 100644 ---- a/HISTORY.rst -+++ b/HISTORY.rst -@@ -3,6 +3,13 @@ - Release History - --------------- - -+1.8.1 (2018-06-28) -+++++++++++++++++++ -+ -+**Bugfixes** -+ -+- Fixed bug in ``ia tasks --get-task-log`` that was returning an unable to parse JSON error. -+ - 1.8.0 (2018-06-28) - ++++++++++++++++++ - -diff --git a/internetarchive/__init__.py b/internetarchive/__init__.py -index 4aa5d2d..0b81dd5 100644 ---- a/internetarchive/__init__.py -+++ b/internetarchive/__init__.py -@@ -37,7 +37,7 @@ - from __future__ import absolute_import - - __title__ = 'internetarchive' --__version__ = '1.8.0' -+__version__ = '1.8.1' - __author__ = 'Jacob M. Johnson' - __license__ = 'AGPL 3' - __copyright__ = 'Copyright (C) 2012-2017 Internet Archive' --- -2.19.0 - diff -Nru python-internetarchive-1.8.1/debian/patches/series python-internetarchive-1.8.5/debian/patches/series --- python-internetarchive-1.8.1/debian/patches/series 2018-09-24 23:08:05.000000000 -0400 +++ python-internetarchive-1.8.5/debian/patches/series 2020-01-31 15:00:57.000000000 -0500 @@ -1 +1 @@ -0001-v1.8.1.patch +0001-close-file-after-getting-md5.patch diff -Nru python-internetarchive-1.8.1/docs/source/api.rst python-internetarchive-1.8.5/docs/source/api.rst --- python-internetarchive-1.8.1/docs/source/api.rst 2018-06-28 19:18:10.000000000 -0400 +++ python-internetarchive-1.8.5/docs/source/api.rst 2019-06-07 17:28:42.000000000 -0400 @@ -117,7 +117,7 @@ Item Objects ------------ -:class:`Item` objects represent `Internet Archive items <items.html>`_. +:class:`Item` objects represent `Internet Archive items <//archive.org/services/docs/api/items.html>`_. From the :class:`Item` object you can create new items, upload files to existing items, read and write metadata, and download or delete files. .. autofunction:: get_item @@ -137,7 +137,7 @@ The item will automatically be created if it does not exist. -Refer to `archive.org Identifiers <metadata.html#archive-org-identifiers>`_ for more information on creating valid archive.org identifiers. +Refer to `archive.org Identifiers <//archive.org/services/docs/api/metadata-schema/index.html#archive-org-identifiers>`_ for more information on creating valid archive.org identifiers. Setting Remote Filenames ^^^^^^^^^^^^^^^^^^^^^^^^ diff -Nru python-internetarchive-1.8.1/docs/source/cli.rst python-internetarchive-1.8.5/docs/source/cli.rst --- python-internetarchive-1.8.1/docs/source/cli.rst 2018-06-28 19:18:10.000000000 -0400 +++ python-internetarchive-1.8.5/docs/source/cli.rst 2019-06-07 17:28:42.000000000 -0400 @@ -64,7 +64,7 @@ Modifying Metadata ^^^^^^^^^^^^^^^^^^ -Once ``ia`` has been `configured <quickstart.html#configuring>`_, you can modify metadata: +Once ``ia`` has been `configured <quickstart.html#configuring>`_, you can modify `metadata <//archive.org/services/docs/api/metadata-schema>`_: .. code:: bash @@ -115,7 +115,7 @@ This would remove ``another subject`` from the items subject field, regardless of whether or not the field is a single or multi-value field. -Refer to `Internet Archive Metadata <metadata.html>`_ for more specific details regarding metadata and archive.org. +Refer to `Internet Archive Metadata <//archive.org/services/docs/api/metadata-schema/index.html>`_ for more specific details regarding metadata and archive.org. Modifying Metadata in Bulk @@ -142,7 +142,7 @@ $ ia upload <identifier> file1 file2 --metadata="mediatype:texts" --metadata="blah:arg" -Please note that, unless specified otherwise, items will be uploaded with a ``data`` mediatype. **This cannot be changed afterwards.** Therefore, you should specify a mediatype when uploading, eg. ``--metadata="mediatype:movies"`` +.. warning:: Please note that, unless specified otherwise, items will be uploaded with a ``data`` mediatype. **This cannot be changed afterwards.** Therefore, you should specify a mediatype when uploading, eg. ``--metadata="mediatype:movies"``. Similarly, if you want your upload to end up somewhere else than the default collection (currently `community texts <//archive.org/details/opensource>`_), you should also specify a collection with ``--metadata="collection:foo"``. See `metadata documentation <//archive.org/services/docs/api/metadata-schema>`_ for more information. You can upload files from ``stdin``: @@ -163,8 +163,8 @@ These files can be deleted like normal files. You can also prevent the backup from happening on clobbers by adding ``-H x-archive-keep-old-version:0`` to your command. -Refer to `archive.org Identifiers <metadata.html#archive-org-identifiers>`_ for more information on creating valid archive.org identifiers. -Please also read the `Internet Archive Items <items.html>`_ page before getting started. +Refer to `archive.org Identifiers <//archive.org/services/docs/api/metadata-schema/index.html#archive-org-identifiers>`_ for more information on creating valid archive.org identifiers. +Please also read the `Internet Archive Items <//archive.org/services/docs/api/items.html>`_ page before getting started. Bulk Uploading ^^^^^^^^^^^^^^ diff -Nru python-internetarchive-1.8.1/docs/source/index.rst python-internetarchive-1.8.5/docs/source/index.rst --- python-internetarchive-1.8.1/docs/source/index.rst 2018-06-28 19:18:10.000000000 -0400 +++ python-internetarchive-1.8.5/docs/source/index.rst 2019-06-07 17:28:42.000000000 -0400 @@ -28,8 +28,6 @@ installation quickstart cli - items - metadata api updates troubleshooting diff -Nru python-internetarchive-1.8.1/docs/source/items.rst python-internetarchive-1.8.5/docs/source/items.rst --- python-internetarchive-1.8.1/docs/source/items.rst 2018-06-28 19:18:10.000000000 -0400 +++ python-internetarchive-1.8.5/docs/source/items.rst 1969-12-31 19:00:00.000000000 -0500 @@ -1,74 +0,0 @@ -Internet Archive Items -====================== - -What Is an Item? ----------------- - -Archive.org is made up of "items". -An item is a logical "thing" that we represent on one web page on archive.org. -An item can be considered as a group of files that deserve their own metadata. -If the files in an item have separate metadata, the files should probably be in different items. -An item can be a book, a song, an album, a dataset, a movie, an image or set of images, etc. -Every item has an `identifier <metadata.html#archive-org-identifiers>`_ that is unique across archive.org. - -How Items Are Structured ------------------------- - -An item is just a directory of files and possibly subdirectories. -Every item has at least two files named in the following format (see `metadata page <metadata.html#archive-org-identifiers>`_ for more context on what an identifier is): - - - ``<identifier>_files.xml`` - - ``<identifier>_meta.xml`` - -The ``_meta.xml`` file is an XML file containing all of the `metadata describing the item <metadata.html>`_. -The ``_files.xml`` file is an XML file containing all of the file-level metadata. -There can only be one ``_meta.xml`` file and one ``_files.xml`` file per item. - -Alongside these metadata files and the original files uploaded to the item, the item may also contain `derivative files automatically generated by archive.org <https://archive.org/help/derivatives.php>`_. - -Item Limitations ----------------- - -As a rule of thumb, items should: - - - **not** be over 100GB - - **not** contain more than 10,000 files. - -Collections ------------ - -All items must be part of a collection. -A collection is simply an item with special characteristics. -Besides an image file for the collection logo, files should **never** be uploaded directly to a collection item. -Items can be assigned to a collection at the time of creation, or after the item has been created by modifying the ``collection`` element in an item's metadata to contain the identifier for the given collection (i.e. ``ia metadata <identifier> -m collection:<collection-identifier>``. -Currently collections can only be created by archive.org staff. -Please contact `[email protected] <mailto:[email protected]>`_ if you need a collection. - -Archival URLs -------------- - -An item's "details" page will always be available at:: - - https://archive.org/details/<identifier> - -The item directory is always available at:: - - https://archive.org/download/<identifier> - -A particular file can always be downloaded from:: - - https://archive.org/download/<identifier>/<filename> - -**Note**: Archival URLs may redirect to an actual server that contains the content. -The resultant URL is **not** a permalink. -For example, the archival URL:: - - https://archive.org/download/popeye_taxi-turvey/popeye_taxi-turvey_meta.xml - -currently redirects to:: - - https://ia802304.us.archive.org/30/items/popeye_taxi-turvey/popeye_taxi-turvey_meta.xml - -**DO NOT LINK** to any archive.org URL that begins with numbers like this. -This refers to the particular machine that we're serving the file from right now, but we move items to new servers all the time. -If you link to this sort of URL, instead of the archival URL, your link **WILL** break at some point. diff -Nru python-internetarchive-1.8.1/docs/source/jq.rst python-internetarchive-1.8.5/docs/source/jq.rst --- python-internetarchive-1.8.1/docs/source/jq.rst 1969-12-31 19:00:00.000000000 -0500 +++ python-internetarchive-1.8.5/docs/source/jq.rst 2019-06-07 17:28:42.000000000 -0400 @@ -0,0 +1,344 @@ +.. _jq: + +Using jq with ia +================ + +`jq <https://stedolan.github.io/jq/>`_ is a lightweight and flexible command-line JSON processor. +It's a great tool for processing the JSON output of ``ia``. +This document will go over how to install or download ``jq`` and how to use it with ``ia``. + +If you have a tip you'd like to add to this page, please email `[email protected] <mailto:[email protected]>`_ or send a pull request. +If you're unable to figure out a ``jq`` command to do what you need and don't see it on this page, please email `[email protected] <mailto:[email protected]>`_ for help. + +Installation +------------ + +Downloading a binary +^^^^^^^^^^^^^^^^^^^^ + +The easiest way to get started with ``jq`` is to download a binary. +Binaries for Linux, OS X, and Windows are available at `https://stedolan.github.io/jq/download/ <https://stedolan.github.io/jq/download/>`_. +Once you find the binary for your OS, you could right-click the hypertext and copy the link to the binary. +Then you could paste it into your terminal and download it like so: + +.. code:: bash + + $ curl -Ls https://github.com/stedolan/jq/releases/download/jq-1.5/jq-osx-amd64 > jq + $ chmod +x jq # make it executable + +To confirm it's working, simply run the following. +You should see the help page. + +.. code:: bash + + $ ./jq + jq - commandline JSON processor [version 1.5] + Usage: ./jq [options] <jq filter> [file...] + + jq is a tool for processing JSON inputs, applying the + given filter to its JSON text inputs and producing the + filter's results as JSON on standard output. + The simplest filter is ., which is the identity filter, + copying jq's input to its output unmodified (except for + formatting). + For more advanced filters see the jq(1) manpage ("man jq") + and/or https://stedolan.github.io/jq + + Some of the options include: + -c compact instead of pretty-printed output; + -n use `null` as the single input value; + -e set the exit status code based on the output; + -s read (slurp) all inputs into an array; apply filter to it; + -r output raw strings, not JSON texts; + -R read raw strings, not JSON texts; + -C colorize JSON; + -M monochrome (don't colorize JSON); + -S sort keys of objects on output; + --tab use tabs for indentation; + --arg a v set variable $a to value <v>; + --argjson a v set variable $a to JSON value <v>; + --slurpfile a f set variable $a to an array of JSON texts read from <f>; + See the manpage for more options. + +Just like the ``ia`` binary, downloading the ``jq`` binary does not install it to your system. +It's simply an executable binary. +To use it, you'll have to use either a relative or absolute path. For example: + +.. code:: bash + + $ ~/jq --help + $ ./jq --help + $ /Users/jake/jq --help + +Installing with a package manager +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +``jq`` can also be installed with most popular package managers: + +.. code:: bash + + # Linux + $ sudo apt-get install jq + + # OS X + $ brew install jq + + # FreeBSD + $ pkg install jq + + # Solaris + $ pkgutil -i jq + + # Windows + $ chocolately install jq + +Please refer to `https://stedolan.github.io/jq/download/ <https://stedolan.github.io/jq/download/>`_ for more details. + + + +Getting started +--------------- + +``jq`` can seem a bit overwhelming at first, so let's get started with some basic examples. +A good way to make sense of how you can access a specific metadata field is to use ``jq 'keys'``. +This will show you the top-level keys that exist in the JSON document. + +.. code:: bash + + $ ia metadata nasa | jq 'keys' + [ + "created", + "d1", + "d2", + "dir", + "files", + "files_count", + "is_collection", + "item_size", + "metadata", + "reviews", + "server", + "uniq", + "workable_servers" + ] + +To access the value of a given key, you can simply do: + +.. code:: bash + + $ ia metadata nasa | jq '.files_count' + 8 + +As you can see, the command above returns the value for the ``files_count`` key. +There are 8 files in the item. + +When working with ``ia metadata`` the ``metadata`` and ``files`` keys are likely to be the targets you'll want to access most. +Let's take a look at ``metadata``: + +.. code:: bash + + $ ia metadata | jq '.metadata | keys' + [ + "addeddate", + "backup_location", + "collection", + "description", + "hidden", + "homepage", + "identifier", + "mediatype", + "num_recent_reviews", + "num_subcollections", + "num_top_dl", + "publicdate", + "related_collection", + "rights", + "show_browse_by_date", + "show_hidden_subcollections", + "show_search_by_year", + "spotlight_identifier", + "title", + "updatedate", + "updater", + "uploader" + ] + +As you might notice, this is all of the item-level metadata (i.e. the JSON equivalent of an item's ``<identifier>_meta.xml`` file). +We can decend deeper into the JSON document like so: + +.. code:: bash + + $ ia metadata nasa | jq '.metadata.title' + "NASA Images" + +``jq`` returns JSON by default. +In this case, a quoted string. +To access the raw value, you can use the ``-r`` option: + +.. code:: bash + + $ ia metadata nasa | jq -r '.metadata.title' + NASA Images + +Search +------ + +``ia search`` outputs JSONL. +JSONL is series of JSON documents separated by a newline. +In this case, one JSON document is returned per search document reutrned. + + +Converting search results to CSV and other formats +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +``jq`` can be used to parse the JSON returned by ``ia search`` into CSV or TSV files: + +.. code:: bash + + $ ia search 'identifier:nasa OR identifier:stairs' --field title,date,subject | jq -r '[.identifier, .title, .date, .subject] | @csv' + "nasa","NASA Images",, + "stairs","stairs where i worked","2004-01-01T00:00:00Z","test" + +If you'd prefer a tab-separated spreadsheet, you can replace ``@csv`` with ``@tsv`` in the command above. +More options can be found in the *Format strings and escaping* section in the `jq manual <https://stedolan.github.io/jq/manual/>`_. + +Catalog +------- + +Get info on all of your IA-S3 tasks: + +.. code:: bash + + $ ia tasks --json | jq 'select(.args.comment == "s3-put")' + +Or, output a link to the tasklog for each S3 task you currently have queued or running: + +.. code:: bash + + $ ia tasks nasa --json \ + | jq -r 'select(.args.comment == "s3-put") | "https://archive.org/log/\(.task_id)"' + https://archive.org/log/469558161 + https://archive.org/log/400818482 + +Get the identifiers for all of your redrows: + +.. code:: bash + + $ ia tasks --json | jq -r 'select(.row_type == "red").identifier' + +TODO +____ + +Recipes to document, work in progress... + + +Select files of a specific format +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. code:: bash + + $ ia metadata nasa | jq '.files[] | select(.format == "JPEG")' + { + "name": "globe_west_540.jpg", + "source": "original", + "size": "66065", + "format": "JPEG", + "mtime": "1245274910", + "md5": "9366a4b09386bf673c447e33d806d904", + "crc32": "2283b5fd", + "sha1": "3e20a009994405f535cdf07cdc2974cef2fce8f2", + "rotation": "0" + } + +Select a file by name +^^^^^^^^^^^^^^^^^^^^^ + +.. code:: bash + + $ ia metadata nasa | jq '.files[] | select(.name == "nasa_meta.xml")' + { + "name": "nasa_meta.xml", + "source": "metadata", + "size": "7968", + "format": "Metadata", + "mtime": "1530756295", + "md5": "06cd95343d60df0f10fb8518b349a795", + "crc32": "6b9c6e24", + "sha1": "c0dc994eeba245671ef53e2f6c52612722bf51d3" + } + + +Get the size of a collection +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. code:: bash + + » ia search 'collection:georgeblood' -f item_size | jq '.item_size' | paste -sd+ - | bc + 51677834206186 + +Getting checksums for all files in an item +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. code:: bash + + $ ia metadata nasa | jq -r '.metadata.identifier as $id | .files[] | [$id, .name, .md5] | @tsv' + nasa NASAarchiveLogo.jpg 64dcc1092df36142eb4aab7cc255a4a6 + nasa __ia_thumb.jpg c354f821954f80516d163c23135e7dd7 + nasa globe_west_540.jpg 9366a4b09386bf673c447e33d806d904 + nasa globe_west_540_thumb.jpg d3dab682c56058c8af0df5a2073b1dd1 + nasa nasa_archive.torrent 70a7b2b44c318bac381c25febca3b2ca + nasa nasa_files.xml 5b8a61ea930ce04d093deebe260fd5f8 + nasa nasa_meta.xml 06cd95343d60df0f10fb8518b349a795 + nasa nasa_reviews.xml 711ba65d49383a25657640716c45e840 + +Creating histograms +^^^^^^^^^^^^^^^^^^^ + +This example creates a histogram of publisher's grouped by item_size. + +.. code:: bash + + » ia search 'collection:georgeblood' -f publisher,item_size \ + | jq -r '"\(.publisher) \(.item_size)"' \ + | awk '{arr[$1]+=$2} END {for (i in arr) {print i,arr[i]}}' \ + | sort -rn -k2 \ + | head + Decca 9518737758200 + Victor 8067854677756 + Columbia 7221975357654 + Capitol 1944338651172 + Brunswick 1574280922547 + Bluebird 1058465142211 + Mercury 1003001910967 + MGM 898067089555 + Okeh 808308437878 + Vocalion 608766709327 + +Get total imagecount of a collection +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. code:: bash + + $ ia search 'scanningcenter:uoft AND shiptracking:ace54704' -f imagecount | jq '.imagecount' | paste -sd+ - | bc + 8172 + +Selecting files based on filesize +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Get the filenames of every file in ``goodytwoshoes00newyiala`` that is larger than 3000 bytes: + +.. code:: bash + + $ ia metadata goodytwoshoes00newyiala \ + | jq -r '.files[] | select(.name | endswith(".pdf")) | select((.size | tonumber) > 3000) | .name' + goodytwoshoes00newyiala.pdf + goodytwoshoes00newyiala_bw.pdf + +You can also include the identifier in the output like so: + +.. code:: bash + + $ ia metadata goodytwoshoes00newyiala \ + | jq -r '.metadata.identifier as $i | .files[] | select(.name | endswith(".pdf")) | select((.size | tonumber) > 3000) | "\($i)/\(.name)"' + goodytwoshoes00newyiala/goodytwoshoes00newyiala.pdf + goodytwoshoes00newyiala/goodytwoshoes00newyiala_bw.pdf diff -Nru python-internetarchive-1.8.1/docs/source/metadata.rst python-internetarchive-1.8.5/docs/source/metadata.rst --- python-internetarchive-1.8.1/docs/source/metadata.rst 2018-06-28 19:18:10.000000000 -0400 +++ python-internetarchive-1.8.5/docs/source/metadata.rst 2019-06-07 17:28:42.000000000 -0400 @@ -63,8 +63,15 @@ Please `contact Internet Archive <mailto:[email protected]?subject=[Collection Creation Request]>`_ if you need a collection created. All items **should** belong to a collection. -If a collection is not specified at the time of upload, it will be added to the ``opensource`` collection. -For testing purposes, you may upload to the ``test_collection`` collection. +If a collection is not specified at the time of upload, it will be added to the `Community texts <https://archive.org/details/opensource>`_ collection. +For testing purposes, you may upload to the ``test_collection`` collection. The following collections are also available to the public at the time of writing: + + * `Community Audio <https://archive.org/details/opensource_audio>`_ + * `Community Media <https://archive.org/details/opensource_media>`_ + * `Community Software <https://archive.org/details/open_source_software>`_ + * `Community Texts <https://archive.org/details/opensource>`_ (default collection) + * `Community Video <https://archive.org/details/opensource_movies>`_ + * `Test collection <https://archive.org/details/test_collection>`_ contributor ^^^^^^^^^^^ diff -Nru python-internetarchive-1.8.1/docs/source/quickstart.rst python-internetarchive-1.8.5/docs/source/quickstart.rst --- python-internetarchive-1.8.1/docs/source/quickstart.rst 2018-06-28 19:18:10.000000000 -0400 +++ python-internetarchive-1.8.5/docs/source/quickstart.rst 2019-06-07 17:28:42.000000000 -0400 @@ -27,7 +27,7 @@ Uploading --------- -Creating a new `item on archive.org <items.html>`_ and uploading files to it is as easy as:: +Creating a new `item on archive.org <//archive.org/services/docs/api/items.html>`_ and uploading files to it is as easy as:: >>> from internetarchive import upload >>> md = dict(collection='test_collection', title='My New Item', mediatype='movies') @@ -67,9 +67,9 @@ You can access all of an item's metadata via the :class:`Item <internetarchive.Item>` object:: >>> from internetarchive import get_item - >>> item = get_item('iacli-test-item301') + >>> item = get_item('nasa') >>> item.item_metadata['metadata']['title'] - 'My Title' + 'NASA Images' :func:`get_item <internetarchive.get_item>` retrieves all of an item's metadata via the `Internet Archive Metadata API <http://blog.archive.org/2013/07/04/metadata-api/>`_. This metadata can be accessed via the ``Item.item_metadata`` attribute:: @@ -79,13 +79,13 @@ All of the top-level keys in ``item.item_metadata`` are available as attributes:: >>> item.server - 'ia801507.us.archive.org' + 'ia802606.us.archive.org' >>> item.item_size - 161752024 + 126586 >>> item.files[0]['name'] - 'blank.txt' + 'NASAarchiveLogo.jpg' >>> item.metadata['identifier'] - 'iacli-test-item301' + 'nasa' Writing Metadata @@ -120,7 +120,7 @@ >>> f.title 'My File Title' -Refer to `Internet Archive Metadata <metadata.html>`_ for more specific details regarding metadata and archive.org. +Refer to `Internet Archive Metadata <//archive.org/services/docs/api/metadata-schema/index.html>`_ for more specific details regarding metadata and archive.org. Downloading diff -Nru python-internetarchive-1.8.1/.gitignore python-internetarchive-1.8.5/.gitignore --- python-internetarchive-1.8.1/.gitignore 2018-06-28 19:18:10.000000000 -0400 +++ python-internetarchive-1.8.5/.gitignore 2019-06-07 17:28:42.000000000 -0400 @@ -7,13 +7,20 @@ itemlist.txt .tox TAGS -*csv +*.csv htmlcov -*log +*.log *.pex +pex/ wheelhouse *gz .venv* .cache .vagrant -.idea \ Pas de fin de ligne à la fin du fichier +.idea +v2/ +v3.6/ +v3.7/ +.pytest_cache/ +.python-version +trash/ diff -Nru python-internetarchive-1.8.1/HISTORY.rst python-internetarchive-1.8.5/HISTORY.rst --- python-internetarchive-1.8.1/HISTORY.rst 2018-06-28 19:18:10.000000000 -0400 +++ python-internetarchive-1.8.5/HISTORY.rst 2019-06-07 17:28:42.000000000 -0400 @@ -3,10 +3,65 @@ Release History --------------- +1.8.5 (2019-06-07) +++++++++++++++++++ + +**Features and Improvements** + +- Improved timeout logging and exceptions. +- Added support for arbitrary targets to metadata write. +- IA-S3 keys now supported for auth in download. +- Authoraization (i.e. ``ia configure``) now uses the archive.org xauthn endpoint. + +**Bugfixes** + +- Fixed encoding error in --get-task-log +- Fixed bug in upload where connections were not being closed in upload. + +1.8.4 (2019-04-11) +++++++++++++++++++ + +**Features and Improvements** + +- It's now possible to retrieve task logs, given a task id, without first retrieving the items task history. +- Added examples to ``ia tasks`` help. + +1.8.3 (2019-03-29) +++++++++++++++++++ + +**Features and Improvements** + +- Increased search timeout from 24 to 300 seconds. + +**Bugfixes** + +- Fixed bug in setup.py where backports.csv wasn't being installed when installing from pypi. + +1.8.2 (2019-03-21) +++++++++++++++++++ + +**Features and Improvements** + +- Documnetation updates. +- Added support for write-many to modify_metadata. + +**Bugfixes** + +- Fixed bug in ``ia tasks --task-id`` where no task was being returned. +- Fixed bug in ``internetarchive.get_tasks()`` where it was not possible to query by ``task_id``. +- Fixed TypeError bug in upload when uploading with checksum=True. + +1.8.1 (2018-06-28) +++++++++++++++++++ + +**Bugfixes** + +- Fixed bug in ``ia tasks --get-task-log`` that was returning an unable to parse JSON error. + 1.8.0 (2018-06-28) ++++++++++++++++++ -**Feautres and Improvements** +**Features and Improvements** - Only use backports.csv for python2 in support of FreeBDS port. - Added a nicer error message to ``ia search`` for authentication errors. @@ -26,7 +81,7 @@ 1.7.7 (2018-03-05) ++++++++++++++++++ -**Feautres and Improvements** +**Features and Improvements** - Added support for downloading on-the-fly archive_marc.xml files. @@ -39,7 +94,7 @@ 1.7.6 (2018-01-05) ++++++++++++++++++ -**Feautres and Improvements** +**Features and Improvements** - Added ability to set the remote-name for a directory in ``ia upload`` (previously you could only do this for single files). @@ -50,7 +105,7 @@ 1.7.5 (2017-12-07) ++++++++++++++++++ -**Feautres and Improvements** +**Features and Improvements** - Turned on ``x-archive-keep-old-version`` S3 header by default for all ``ia upload``, ``ia delete``, ``ia copy``, and ``ia move`` commands. This means that any ``ia`` command that clobbers or deletes a command, will save a version of the file in ``<identifier>/history/files/$key.~N~``. @@ -60,7 +115,7 @@ 1.7.4 (2017-11-06) ++++++++++++++++++ -**Feautres and Improvements** +**Features and Improvements** - Increased timeout in search from 12 seconds to 24. - Added ability to set the ``max_retries`` in :func:`internetarchive.search_items`. @@ -83,7 +138,7 @@ 1.7.2 (2017-09-11) ++++++++++++++++++ -**Feautres and Improvements** +**Features and Improvements** - Added support for adding custom headers to ``ia search``. @@ -114,7 +169,7 @@ 1.7.0 (2017-07-25) ++++++++++++++++++ -**Feautres and Improvements** +**Features and Improvements** - Loosened up ``jsonpatch`` requirements, as the metadata API now supports more recent versions of the JSON Patch standard. - Added support for building "snap" packages (https://snapcraft.io/). diff -Nru python-internetarchive-1.8.1/internetarchive/api.py python-internetarchive-1.8.5/internetarchive/api.py --- python-internetarchive-1.8.1/internetarchive/api.py 2018-06-28 19:18:10.000000000 -0400 +++ python-internetarchive-1.8.5/internetarchive/api.py 2019-06-07 17:28:42.000000000 -0400 @@ -2,7 +2,7 @@ # # The internetarchive module is a Python/CLI interface to Archive.org. # -# Copyright (C) 2012-2017 Internet Archive +# Copyright (C) 2012-2019 Internet Archive # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU Affero General Public License as @@ -23,7 +23,7 @@ This module implements the Internetarchive API. -:copyright: (C) 2012-2017 by Internet Archive. +:copyright: (C) 2012-2019 by Internet Archive. :license: AGPL 3, see LICENSE for more details. """ from __future__ import absolute_import @@ -447,7 +447,7 @@ def get_tasks(identifier=None, - task_ids=None, + task_id=None, task_type=None, params=None, config=None, @@ -464,8 +464,8 @@ :param identifier: (optional) The Archive.org identifier for which to retrieve tasks for. - :type task_ids: int or str - :param task_ids: (optional) The task_ids to retrieve from the Archive.org catalog. + :type task_id: int or str + :param task_is: (optional) The task_id to retrieve from the Archive.org catalog. :type task_type: str :param task_type: (optional) The type of tasks to retrieve from the Archive.org @@ -489,7 +489,7 @@ if not archive_session: archive_session = get_session(config, config_file, http_adapter_kwargs) return archive_session.get_tasks(identifier=identifier, - task_ids=task_ids, + task_id=task_id, params=params, config=config, verbose=verbose, diff -Nru python-internetarchive-1.8.1/internetarchive/auth.py python-internetarchive-1.8.5/internetarchive/auth.py --- python-internetarchive-1.8.1/internetarchive/auth.py 2018-06-28 19:18:10.000000000 -0400 +++ python-internetarchive-1.8.5/internetarchive/auth.py 2019-06-07 17:28:42.000000000 -0400 @@ -2,7 +2,7 @@ # # The internetarchive module is a Python/CLI interface to Archive.org. # -# Copyright (C) 2012-2017 Internet Archive +# Copyright (C) 2012-2019 Internet Archive # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU Affero General Public License as @@ -23,7 +23,7 @@ This module contains the Archive.org authentication handlers for Requests. -:copyright: (C) 2012-2017 by Internet Archive. +:copyright: (C) 2012-2019 by Internet Archive. :license: AGPL 3, see LICENSE for more details. """ from requests.auth import AuthBase diff -Nru python-internetarchive-1.8.1/internetarchive/catalog.py python-internetarchive-1.8.5/internetarchive/catalog.py --- python-internetarchive-1.8.1/internetarchive/catalog.py 2018-06-28 19:18:10.000000000 -0400 +++ python-internetarchive-1.8.5/internetarchive/catalog.py 2019-06-07 17:28:42.000000000 -0400 @@ -2,7 +2,7 @@ # # The internetarchive module is a Python/CLI interface to Archive.org. # -# Copyright (C) 2012-2017 Internet Archive +# Copyright (C) 2012-2019 Internet Archive # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU Affero General Public License as @@ -23,7 +23,7 @@ This module contains objects for interacting with the Archive.org catalog. -:copyright: (C) 2012-2017 by Internet Archive. +:copyright: (C) 2012-2019 by Internet Archive. :license: AGPL 3, see LICENSE for more details. """ from __future__ import absolute_import @@ -128,6 +128,8 @@ self.params['justme'] = 1 if task_id: + if isinstance(task_id, list): + task_id = task_id[0] task_id = str(task_id) self.params.update(dict( search_task_id=task_id, @@ -217,9 +219,31 @@ """ if self.task_id is None: raise ValueError('task_id is None') - url = '{0}//catalogd.archive.org/log/{1}'.format(self.session.protocol, - self.task_id) + return self.get_task_log(self.task_id, self.session, self.request_kwargs) + + @staticmethod + def get_task_log(task_id, session, request_kwargs=None): + """Static method for getting a task log, given a task_id. + + This method exists so a task log can be retrieved without + retrieving the items task history first. + + :type task_id: str or int + :param task_id: The task id for the task log you'd like to fetch. + + :type archive_session: :class:`ArchiveSession <ArchiveSession>` + + :type request_kwargs: dict + :param request_kwargs: (optional) Keyword arguments that + :py:class:`requests.Request` takes. + + :rtype: str + :returns: The task log as a string. + + """ + request_kwargs = request_kwargs if request_kwargs else dict() + url = '{0}//catalogd.archive.org/log/{1}'.format(session.protocol, task_id) p = dict(full=1) - r = self.session.get(url, params=p, **self.request_kwargs) + r = session.get(url, params=p, **request_kwargs) r.raise_for_status() return r.content.decode('utf-8') diff -Nru python-internetarchive-1.8.1/internetarchive/cli/argparser.py python-internetarchive-1.8.5/internetarchive/cli/argparser.py --- python-internetarchive-1.8.1/internetarchive/cli/argparser.py 2018-06-28 19:18:10.000000000 -0400 +++ python-internetarchive-1.8.5/internetarchive/cli/argparser.py 2019-06-07 17:28:42.000000000 -0400 @@ -2,7 +2,7 @@ # # The internetarchive module is a Python/CLI interface to Archive.org. # -# Copyright (C) 2012-2016 Internet Archive +# Copyright (C) 2012-2019 Internet Archive # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU Affero General Public License as @@ -21,7 +21,7 @@ internetarchive.cli.argparser ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -:copyright: (C) 2012-2016 by Internet Archive. +:copyright: (C) 2012-2019 by Internet Archive. :license: AGPL 3, see LICENSE for more details. """ from collections import defaultdict @@ -54,6 +54,18 @@ return metadata +def get_args_dict_many_write(metadata): + changes = defaultdict(list) + for key in metadata: + target = '/'.join(key.split('/')[:-1]) + field = key.split('/')[-1] + if not changes[target]: + changes[target] = {field: metadata[key]} + else: + changes[target][field] = metadata[key] + return changes + + def convert_str_list_to_unicode(str_list): unicode_list = list() for x in str_list: diff -Nru python-internetarchive-1.8.1/internetarchive/cli/ia_configure.py python-internetarchive-1.8.5/internetarchive/cli/ia_configure.py --- python-internetarchive-1.8.1/internetarchive/cli/ia_configure.py 2018-06-28 19:18:10.000000000 -0400 +++ python-internetarchive-1.8.5/internetarchive/cli/ia_configure.py 2019-06-07 17:28:42.000000000 -0400 @@ -2,7 +2,7 @@ # # The internetarchive module is a Python/CLI interface to Archive.org. # -# Copyright (C) 2012-2016 Internet Archive +# Copyright (C) 2012-2019 Internet Archive # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU Affero General Public License as diff -Nru python-internetarchive-1.8.1/internetarchive/cli/ia_copy.py python-internetarchive-1.8.5/internetarchive/cli/ia_copy.py --- python-internetarchive-1.8.1/internetarchive/cli/ia_copy.py 2018-06-28 19:18:10.000000000 -0400 +++ python-internetarchive-1.8.5/internetarchive/cli/ia_copy.py 2019-06-07 17:28:42.000000000 -0400 @@ -2,7 +2,7 @@ # # The internetarchive module is a Python/CLI interface to Archive.org. # -# Copyright (C) 2012-2016 Internet Archive +# Copyright (C) 2012-2019 Internet Archive # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU Affero General Public License as @@ -69,12 +69,12 @@ s = Schema({ str: Use(bool), '<src-identifier>/<src-file>': And(str, And(And(str, lambda x: '/' in x, - error='Destiantion not formatted correctly. See usage example.'), + error='Destination not formatted correctly. See usage example.'), assert_src_file_exists, error=( 'https://archive.org/download/{} does not exist. ' 'Please check the identifier and filepath and retry.'.format(src_path)))), '<dest-identifier>/<dest-file>': And(str, lambda x: '/' in x, - error='Destiantion not formatted correctly. See usage example.'), + error='Destination not formatted correctly. See usage example.'), '--metadata': Or(None, And(Use(get_args_dict), dict), error='--metadata must be formatted as --metadata="key:value"'), '--header': Or(None, And(Use(get_args_dict), dict), diff -Nru python-internetarchive-1.8.1/internetarchive/cli/ia_delete.py python-internetarchive-1.8.5/internetarchive/cli/ia_delete.py --- python-internetarchive-1.8.1/internetarchive/cli/ia_delete.py 2018-06-28 19:18:10.000000000 -0400 +++ python-internetarchive-1.8.5/internetarchive/cli/ia_delete.py 2019-06-07 17:28:42.000000000 -0400 @@ -2,7 +2,7 @@ # # The internetarchive module is a Python/CLI interface to Archive.org. # -# Copyright (C) 2012-2016 Internet Archive +# Copyright (C) 2012-2019 Internet Archive # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU Affero General Public License as diff -Nru python-internetarchive-1.8.1/internetarchive/cli/ia_download.py python-internetarchive-1.8.5/internetarchive/cli/ia_download.py --- python-internetarchive-1.8.1/internetarchive/cli/ia_download.py 2018-06-28 19:18:10.000000000 -0400 +++ python-internetarchive-1.8.5/internetarchive/cli/ia_download.py 2019-06-07 17:28:42.000000000 -0400 @@ -2,7 +2,7 @@ # # The internetarchive module is a Python/CLI interface to Archive.org. # -# Copyright (C) 2012-2016 Internet Archive +# Copyright (C) 2012-2019 Internet Archive # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU Affero General Public License as @@ -37,7 +37,7 @@ -I, --itemlist=<file> Download items from a specified file. Itemlists should be a plain text file with one identifier per line. -S, --search=<query> Download items returned from a specified search query. - -p, --search-parameters=<key:value>... Download items returned from a specified search query. + -P, --search-parameters=<key:value>... Download items returned from a specified search query. -g, --glob=<pattern> Only download files whose filename matches the given glob pattern. -f, --format=<format>... Only download files of the specified format(s). @@ -57,6 +57,7 @@ -s, --stdout Write file contents to stdout. --no-change-timestamp Don't change the timestamp of downloaded files to reflect the source material. + -p, --parameters=<key:value>... Parameters to send with your query (e.g. `cnt=0`). """ from __future__ import print_function, absolute_import import os @@ -96,7 +97,8 @@ '--retries': Use(lambda x: x[0]), '--search-parameters': Use(lambda x: get_args_dict(x, query_string=True)), '--on-the-fly': Use(bool), - '--no-change-timestamp': Use(bool) + '--no-change-timestamp': Use(bool), + '--parameters': Use(lambda x: get_args_dict(x, query_string=True)), }) # Filenames should be unicode literals. Support PY2 and PY3. @@ -166,7 +168,9 @@ stdout_buf = sys.stdout else: stdout_buf = sys.stdout.buffer - f[0].download(retries=args['--retries'], fileobj=stdout_buf) + f[0].download(retries=args['--retries'], + fileobj=stdout_buf, + params=args['--parameters']) sys.exit(0) try: identifier = identifier.strip() @@ -204,7 +208,8 @@ item_index=item_index, ignore_errors=True, on_the_fly=args['--on-the-fly'], - no_change_timestamp=args['--no-change-timestamp'] + no_change_timestamp=args['--no-change-timestamp'], + params=args['--parameters'] ) if _errors: errors.append(_errors) diff -Nru python-internetarchive-1.8.1/internetarchive/cli/ia_list.py python-internetarchive-1.8.5/internetarchive/cli/ia_list.py --- python-internetarchive-1.8.1/internetarchive/cli/ia_list.py 2018-06-28 19:18:10.000000000 -0400 +++ python-internetarchive-1.8.5/internetarchive/cli/ia_list.py 2019-06-07 17:28:42.000000000 -0400 @@ -2,7 +2,7 @@ # # The internetarchive module is a Python/CLI interface to Archive.org. # -# Copyright (C) 2012-2016 Internet Archive +# Copyright (C) 2012-2019 Internet Archive # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU Affero General Public License as diff -Nru python-internetarchive-1.8.1/internetarchive/cli/ia_metadata.py python-internetarchive-1.8.5/internetarchive/cli/ia_metadata.py --- python-internetarchive-1.8.1/internetarchive/cli/ia_metadata.py 2018-06-28 19:18:10.000000000 -0400 +++ python-internetarchive-1.8.5/internetarchive/cli/ia_metadata.py 2019-06-07 17:28:42.000000000 -0400 @@ -2,7 +2,7 @@ # # The internetarchive module is a Python/CLI interface to Archive.org. # -# Copyright (C) 2012-2016 Internet Archive +# Copyright (C) 2012-2019 Internet Archive # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU Affero General Public License as @@ -25,7 +25,7 @@ [--priority=<priority>] ia metadata <identifier>... --remove=<key:value>... [--priority=<priority>] ia metadata <identifier>... [--append=<key:value>... | --append-list=<key:value>...] - [--priority=<priority>] + [--priority=<priority>] [--target=<target>] ia metadata --spreadsheet=<metadata.csv> [--priority=<priority>] [--modify=<key:value>...] ia metadata --help @@ -60,7 +60,7 @@ from schema import Schema, SchemaError, Or, And, Use import six -from internetarchive.cli.argparser import get_args_dict +from internetarchive.cli.argparser import get_args_dict, get_args_dict_many_write # Only import backports.csv for Python2 (in support of FreeBSD port). PY2 = sys.version_info[0] == 2 @@ -188,6 +188,8 @@ metadata_args = args['--remove'] try: metadata = get_args_dict(metadata_args) + if any('/' in k for k in metadata): + metadata = get_args_dict_many_write(metadata) except ValueError: print("error: The value of --modify, --remove, --append or --append-list " "is invalid. It must be formatted as: --modify=key:value", diff -Nru python-internetarchive-1.8.1/internetarchive/cli/ia_move.py python-internetarchive-1.8.5/internetarchive/cli/ia_move.py --- python-internetarchive-1.8.1/internetarchive/cli/ia_move.py 2018-06-28 19:18:10.000000000 -0400 +++ python-internetarchive-1.8.5/internetarchive/cli/ia_move.py 2019-06-07 17:28:42.000000000 -0400 @@ -2,7 +2,7 @@ # # The internetarchive module is a Python/CLI interface to Archive.org. # -# Copyright (C) 2012-2016 Internet Archive +# Copyright (C) 2012-2019 Internet Archive # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU Affero General Public License as @@ -53,7 +53,7 @@ '<src-identifier>/<src-file>': And(str, lambda x: '/' in x, error='Source not formatted correctly. See usage example.'), '<dest-identifier>/<dest-file>': And(str, lambda x: '/' in x, - error='Destiantion not formatted correctly. See usage example.'), + error='Destination not formatted correctly. See usage example.'), }) try: args = s.validate(args) @@ -63,7 +63,7 @@ # Add keep-old-version by default. if 'x-archive-keep-old-version' not in args['--header']: - headers['x-archive-keep-old-version'] = '1' + args['--header']['x-archive-keep-old-version'] = '1' # First we use ia_copy, prep argv for ia_copy. argv.pop(0) diff -Nru python-internetarchive-1.8.1/internetarchive/cli/ia.py python-internetarchive-1.8.5/internetarchive/cli/ia.py --- python-internetarchive-1.8.1/internetarchive/cli/ia.py 2018-06-28 19:18:10.000000000 -0400 +++ python-internetarchive-1.8.5/internetarchive/cli/ia.py 2019-06-07 17:28:42.000000000 -0400 @@ -3,7 +3,7 @@ # # The internetarchive module is a Python/CLI interface to Archive.org. # -# Copyright (C) 2012-2016 Internet Archive +# Copyright (C) 2012-2019 Internet Archive # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU Affero General Public License as diff -Nru python-internetarchive-1.8.1/internetarchive/cli/ia_search.py python-internetarchive-1.8.5/internetarchive/cli/ia_search.py --- python-internetarchive-1.8.1/internetarchive/cli/ia_search.py 2018-06-28 19:18:10.000000000 -0400 +++ python-internetarchive-1.8.5/internetarchive/cli/ia_search.py 2019-06-07 17:28:42.000000000 -0400 @@ -2,7 +2,7 @@ # # The internetarchive module is a Python/CLI interface to Archive.org. # -# Copyright (C) 2012-2016 Internet Archive +# Copyright (C) 2012-2019 Internet Archive # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU Affero General Public License as @@ -33,7 +33,7 @@ -i, --itemlist Output identifiers only. -f, --field=<field>... Metadata fields to return. -n, --num-found Print the number of results to stdout. - -t, --timeout=<seconds> Set the timeout in seconds [default: 24]. + -t, --timeout=<seconds> Set the timeout in seconds [default: 300]. """ from __future__ import absolute_import, print_function, unicode_literals import sys diff -Nru python-internetarchive-1.8.1/internetarchive/cli/ia_tasks.py python-internetarchive-1.8.5/internetarchive/cli/ia_tasks.py --- python-internetarchive-1.8.1/internetarchive/cli/ia_tasks.py 2018-06-28 19:18:10.000000000 -0400 +++ python-internetarchive-1.8.5/internetarchive/cli/ia_tasks.py 2019-06-07 17:28:42.000000000 -0400 @@ -2,7 +2,7 @@ # # The internetarchive module is a Python/CLI interface to Archive.org. # -# Copyright (C) 2012-2016 Internet Archive +# Copyright (C) 2012-2019 Internet Archive # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU Affero General Public License as @@ -36,15 +36,22 @@ -g, --green-rows Return information about tasks that have not run. -b, --blue-rows Return information about running tasks. -r, --red-rows Return information about tasks that have failed. - -p, --parameter=<k:v>... Return tasks matching the given parameter. + -p, --parameter=<k:v>... URL parameters passed to catalog.php. -j, --json Output detailed information in JSON. +examples: + ia tasks nasa + ia tasks nasa -p cmds:derive.php # only return derive.php tasks + ia tasks -p mode:s3 # return all S3 tasks + ia tasks --get-task-log 1178878475 # get a task log for a specific task """ from __future__ import absolute_import, print_function import sys import json from docopt import docopt +from requests.exceptions import HTTPError +import six from internetarchive.cli.argparser import get_args_dict @@ -76,13 +83,16 @@ task_type=task_type, params=params) elif args['--get-task-log']: - task = session.get_tasks(task_id=args['--get-task-log'], params=params) - if task: - log = task[0].task_log() - sys.exit(print(log)) - else: + try: + log = session.get_task_log(args['--get-task-log'], params) + if six.PY2: + print(log.encode('utf-8')) + else: + print(log) + sys.exit(0) + except HTTPError: print('error retrieving task-log ' - 'for {0}\n'.format(args['--get-task-log']), file=sys.stderr) + 'for {0}'.format(args['--get-task-log']), file=sys.stderr) sys.exit(1) elif args['--task']: tasks = session.get_tasks(task_id=args['--task'], params=params) diff -Nru python-internetarchive-1.8.1/internetarchive/cli/ia_upload.py python-internetarchive-1.8.5/internetarchive/cli/ia_upload.py --- python-internetarchive-1.8.1/internetarchive/cli/ia_upload.py 2018-06-28 19:18:10.000000000 -0400 +++ python-internetarchive-1.8.5/internetarchive/cli/ia_upload.py 2019-06-07 17:28:42.000000000 -0400 @@ -2,7 +2,7 @@ # # The internetarchive module is a Python/CLI interface to Archive.org. # -# Copyright (C) 2012-2016 Internet Archive +# Copyright (C) 2012-2019 Internet Archive # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU Affero General Public License as @@ -102,7 +102,7 @@ # Format error message for any non 200 responses that # we haven't caught yet,and write to stderr. - if responses and responses[-1] and responses[-1].status_code != 200: + if responses and responses[-1].status_code and responses[-1].status_code != 200: if not responses[-1].status_code: return responses filename = responses[-1].request.url.split('/')[-1] @@ -110,7 +110,6 @@ msg = get_s3_xml_text(responses[-1].content) except: msg = responses[-1].content - print(' error uploading {0}: {2}'.format(filename, msg), file=sys.stderr) return responses @@ -225,7 +224,7 @@ for _r in _upload_files(item, files, upload_kwargs): if args['--debug']: break - if (not _r) or (not _r.ok): + if (not _r.status_code) or (not _r.ok): ERRORS = True # Bulk upload using spreadsheet. diff -Nru python-internetarchive-1.8.1/internetarchive/cli/__init__.py python-internetarchive-1.8.5/internetarchive/cli/__init__.py --- python-internetarchive-1.8.1/internetarchive/cli/__init__.py 2018-06-28 19:18:10.000000000 -0400 +++ python-internetarchive-1.8.5/internetarchive/cli/__init__.py 2019-06-07 17:28:42.000000000 -0400 @@ -2,7 +2,7 @@ # # The internetarchive module is a Python/CLI interface to Archive.org. # -# Copyright (C) 2012-2016 Internet Archive +# Copyright (C) 2012-2019 Internet Archive # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU Affero General Public License as @@ -21,7 +21,7 @@ internetarchive.cli ~~~~~~~~~~~~~~~~~~~ -:copyright: (C) 2012-2016 by Internet Archive. +:copyright: (C) 2012-2019 by Internet Archive. :license: AGPL 3, see LICENSE for more details. """ from internetarchive.cli import ia, ia_configure, ia_delete, ia_download, ia_list, \ diff -Nru python-internetarchive-1.8.1/internetarchive/config.py python-internetarchive-1.8.5/internetarchive/config.py --- python-internetarchive-1.8.1/internetarchive/config.py 2018-06-28 19:18:10.000000000 -0400 +++ python-internetarchive-1.8.5/internetarchive/config.py 2019-06-07 17:28:42.000000000 -0400 @@ -2,7 +2,7 @@ # # The internetarchive module is a Python/CLI interface to Archive.org. # -# Copyright (C) 2012-2017 Internet Archive +# Copyright (C) 2012-2019 Internet Archive # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU Affero General Public License as @@ -21,7 +21,7 @@ internetarchive.config ~~~~~~~~~~~~~~~~~~~~~~ -:copyright: (C) 2012-2017 by Internet Archive. +:copyright: (C) 2012-2019 by Internet Archive. :license: AGPL 3, see LICENSE for more details. """ from __future__ import absolute_import @@ -37,57 +37,37 @@ from internetarchive import auth -def get_auth_config(username, password): - payload = dict( - username=username, - password=password, - remember='CHECKED', - action='login', - ) - - with requests.Session() as s: - # Attache logged-in-* cookies to Session. - u = 'https://archive.org/account/login.php' - r = s.post(u, data=payload, cookies={'test-cookie': '1'}) - if 'logged-in-sig' not in s.cookies: - raise AuthenticationError('Authentication failed. ' - 'Please check your credentials and try again.') - - # Get S3 keys. - u = 'https://archive.org/account/s3.php' - p = dict(output_json=1) - r = s.get(u, params=p) - j = r.json() - access_key = j['key']['s3accesskey'] - secret_key = j['key']['s3secretkey'] - if not j or not j.get('key'): - raise AuthenticationError('Authentication failed. ' - 'Please check your credentials and try again.') - - # Get user info (screenname). - u = 'https://s3.us.archive.org' - p = dict(check_auth=1) - r = requests.get(u, params=p, auth=auth.S3Auth(access_key, secret_key)) - r.raise_for_status() - j = r.json() - if j.get('error'): - raise AuthenticationError(j.get('error')) - user_info = j['screenname'] - - auth_config = { - 's3': { - 'access': access_key, - 'secret': secret_key, - }, - 'cookies': { - 'logged-in-user': s.cookies['logged-in-user'], - 'logged-in-sig': s.cookies['logged-in-sig'], - }, - 'general': { - 'screenname': user_info, - } +def get_auth_config(email, password): + u = 'https://archive.org/services/xauthn/' + p = dict(op='login') + d = dict(email=email, password=password) + r = requests.post(u, params=p, data=d) + j = r.json() + if not j.get('success'): + try: + msg = j['values']['reason'] + except KeyError: + msg = j['error'] + if msg == 'account_not_found': + msg = 'Account not found, check your email and try again.' + elif msg == 'account_bad_password': + msg = 'Incorrect password, try again.' + else: + msg = 'Authentication failed: {}'.format(msg) + raise AuthenticationError(msg) + auth_config = { + 's3': { + 'access': j['values']['s3']['access'], + 'secret': j['values']['s3']['secret'], + }, + 'cookies': { + 'logged-in-user': j['values']['cookies']['logged-in-user'], + 'logged-in-sig': j['values']['cookies']['logged-in-sig'], + }, + 'general': { + 'screenname': j['values']['screenname'], } - + } return auth_config diff -Nru python-internetarchive-1.8.1/internetarchive/exceptions.py python-internetarchive-1.8.5/internetarchive/exceptions.py --- python-internetarchive-1.8.1/internetarchive/exceptions.py 2018-06-28 19:18:10.000000000 -0400 +++ python-internetarchive-1.8.5/internetarchive/exceptions.py 2019-06-07 17:28:42.000000000 -0400 @@ -2,7 +2,7 @@ # # The internetarchive module is a Python/CLI interface to Archive.org. # -# Copyright (C) 2012-2017 Internet Archive +# Copyright (C) 2012-2019 Internet Archive # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU Affero General Public License as @@ -21,7 +21,7 @@ internetarchive.exceptions ~~~~~~~~~~~~~~~~~~~~~~~~~~ -:copyright: (C) 2012-2017 by Internet Archive. +:copyright: (C) 2012-2019 by Internet Archive. :license: AGPL 3, see LICENSE for more details. """ diff -Nru python-internetarchive-1.8.1/internetarchive/files.py python-internetarchive-1.8.5/internetarchive/files.py --- python-internetarchive-1.8.1/internetarchive/files.py 2018-06-28 19:18:10.000000000 -0400 +++ python-internetarchive-1.8.5/internetarchive/files.py 2019-06-07 17:28:42.000000000 -0400 @@ -2,7 +2,7 @@ # # The internetarchive module is a Python/CLI interface to Archive.org. # -# Copyright (C) 2012-2017 Internet Archive +# Copyright (C) 2012-2019 Internet Archive # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU Affero General Public License as @@ -21,7 +21,7 @@ internetarchive.files ~~~~~~~~~~~~~~~~~~~~~ -:copyright: (C) 2012-2017 by Internet Archive. +:copyright: (C) 2012-2019 by Internet Archive. :license: AGPL 3, see LICENSE for more details. """ from __future__ import absolute_import, unicode_literals, print_function @@ -35,7 +35,7 @@ from requests.exceptions import HTTPError, RetryError, ConnectTimeout, \ ConnectionError, ReadTimeout -from internetarchive import iarequest, utils +from internetarchive import iarequest, utils, auth log = logging.getLogger(__name__) @@ -117,6 +117,11 @@ name=urllib.parse.quote(name.encode('utf-8')), ) self.url = '{protocol}//archive.org/download/{id}/{name}'.format(**url_parts) + if self.item.session.access_key and self.item.session.secret_key: + self.auth = auth.S3Auth(self.item.session.access_key, + self.item.session.secret_key) + else: + self.auth = None def __repr__(self): return ('File(identifier={identifier!r}, ' @@ -126,7 +131,8 @@ def download(self, file_path=None, verbose=None, silent=None, ignore_existing=None, checksum=None, destdir=None, retries=None, ignore_errors=None, - fileobj=None, return_responses=None, no_change_timestamp=None): + fileobj=None, return_responses=None, no_change_timestamp=None, + params=None): """Download the file into the current working directory. :type file_path: str @@ -169,6 +175,10 @@ current time instead of changing it to that given in the original archive. + :type params: dict + :param params: (optional) URL parameters to send with + download request (e.g. `cnt=0`). + :rtype: bool :returns: True if file was successfully downloaded. """ @@ -179,6 +189,7 @@ ignore_errors = False if not ignore_errors else ignore_errors return_responses = False if not return_responses else return_responses no_change_timestamp = False if not no_change_timestamp else no_change_timestamp + params = None if not params else params if (fileobj and silent is None) or silent is not False: silent = True @@ -240,7 +251,11 @@ os.makedirs(parent_dir) try: - response = self.item.session.get(self.url, stream=True, timeout=12) + response = self.item.session.get(self.url, + stream=True, + timeout=12, + auth=self.auth, + params=params) response.raise_for_status() if return_responses: return response diff -Nru python-internetarchive-1.8.1/internetarchive/iarequest.py python-internetarchive-1.8.5/internetarchive/iarequest.py --- python-internetarchive-1.8.1/internetarchive/iarequest.py 2018-06-28 19:18:10.000000000 -0400 +++ python-internetarchive-1.8.5/internetarchive/iarequest.py 2019-06-07 17:28:42.000000000 -0400 @@ -2,7 +2,7 @@ # # The internetarchive module is a Python/CLI interface to Archive.org. # -# Copyright (C) 2012-2017 Internet Archive +# Copyright (C) 2012-2019 Internet Archive # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU Affero General Public License as @@ -21,7 +21,7 @@ internetarchive.iarequest ~~~~~~~~~~~~~~~~~~~~~~~~~ -:copyright: (C) 2012-2017 by Internet Archive. +:copyright: (C) 2012-2019 by Internet Archive. :license: AGPL 3, see LICENSE for more details. """ from __future__ import absolute_import @@ -32,6 +32,7 @@ import json import re import copy +import logging from six.moves import urllib import requests.models @@ -40,7 +41,10 @@ import six from internetarchive import auth, __version__ -from internetarchive.utils import needs_quote +from internetarchive.utils import needs_quote, delete_items_from_dict + + +logger = logging.getLogger(__name__) class S3Request(requests.models.Request): @@ -232,40 +236,120 @@ if not source_metadata: r = requests.get(self.url) - source_metadata = r.json().get(target.split('/')[0], {}) - if 'metadata' in target: - destination_metadata = source_metadata.copy() - prepared_metadata = prepare_metadata(metadata, source_metadata, append, - append_list) - destination_metadata.update(prepared_metadata) - elif 'files' in target: - filename = '/'.join(target.split('/')[1:]) - for f in source_metadata: - if f.get('name') == filename: - source_metadata = f - break - destination_metadata = source_metadata.copy() - prepared_metadata = prepare_metadata(metadata, source_metadata, append) - destination_metadata.update(prepared_metadata) + source_metadata = r.json() + + # Write to many targets + if isinstance(metadata, list) \ + or any('/' in k for k in metadata) \ + or all(isinstance(k, dict) for k in metadata.values()): + changes = list() + + if any(not k for k in metadata): + raise ValueError('Invalid metadata provided, ' + 'check your input and try again') + + if target: + metadata = {target: metadata} + for key in metadata: + if key == 'metadata': + patch = prepare_patch(metadata[key], + source_metadata['metadata'], + append, + append_list) + elif key.startswith('files'): + patch = prepare_files_patch(metadata[key], + source_metadata['files'], + append, + key, + append_list) + else: + key = key.split('/')[0] + patch = prepare_target_patch(metadata, source_metadata, append, + target, append_list, key) + changes.append({'target': key, 'patch': patch}) + self.data = { + '-changes': json.dumps(changes), + 'priority': priority, + } + logger.debug('submitting metadata request: {}'.format(self.data)) + # Write to single target else: - destination_metadata = source_metadata.copy() - prepared_metadata = prepare_metadata(metadata, source_metadata, append) - destination_metadata.update(prepared_metadata) - - # Delete metadata items where value is REMOVE_TAG. - destination_metadata = dict( - (k, v) for (k, v) in destination_metadata.items() if v != 'REMOVE_TAG' - ) + if not target or 'metadata' in target: + target = 'metadata' + patch = prepare_patch(metadata, source_metadata['metadata'], append, + append_list) + elif 'files' in target: + patch = prepare_files_patch(metadata, source_metadata['files'], append, + target, append_list) + else: + metadata = {target: metadata} + patch = prepare_target_patch(metadata, source_metadata, append, + target, append_list, target) + self.data = { + '-patch': json.dumps(patch), + '-target': target, + 'priority': priority, + } + logger.debug('submitting metadata request: {}'.format(self.data)) + super(MetadataPreparedRequest, self).prepare_body(self.data, None) - patch = json.dumps(make_patch(source_metadata, destination_metadata).patch) - self.data = { - '-patch': patch, - '-target': target, - 'priority': priority, - } +def prepare_patch(metadata, source_metadata, append, append_list=None): + destination_metadata = source_metadata.copy() + if isinstance(metadata, list): + prepared_metadata = metadata + if not destination_metadata: + destination_metadata = list() + else: + prepared_metadata = prepare_metadata(metadata, source_metadata, append, + append_list) + if isinstance(destination_metadata, dict): + destination_metadata.update(prepared_metadata) + elif isinstance(metadata, list) and not destination_metadata: + destination_metadata = metadata + else: + if isinstance(prepared_metadata, list): + if append_list: + destination_metadata += prepared_metadata + else: + destination_metadata = prepared_metadata + else: + destination_metadata.append(prepared_metadata) + # Delete metadata items where value is REMOVE_TAG. + destination_metadata = delete_items_from_dict(destination_metadata, 'REMOVE_TAG') + patch = make_patch(source_metadata, destination_metadata).patch + return patch + + +def prepare_target_patch(metadata, source_metadata, append, target, append_list, key): + + def dictify(lst, key=None, value=None): + if not lst: + return value + sub_dict = dictify(lst[1:], key, value) + for i, v in enumerate(lst): + md = {v: copy.deepcopy(sub_dict)} + return md + + for _k in metadata: + metadata = dictify(_k.split('/')[1:], _k.split('/')[-1], metadata[_k]) + for i, _k in enumerate(key.split('/')): + if i == 0: + source_metadata = source_metadata.get(_k, dict()) + else: + source_metadata[_k] = source_metadata.get(_k, dict()).get(_k, dict()) + patch = prepare_patch(metadata, source_metadata, append, append_list) + return patch - super(MetadataPreparedRequest, self).prepare_body(self.data, None) + +def prepare_files_patch(metadata, source_metadata, append, target, append_list): + filename = '/'.join(target.split('/')[1:]) + for f in source_metadata: + if f.get('name') == filename: + source_metadata = f + break + patch = prepare_patch(metadata, source_metadata, append, append_list) + return patch def prepare_metadata(metadata, source_metadata=None, append=False, append_list=False): @@ -338,8 +422,12 @@ if not isinstance(metadata[key], list): metadata[key] = [metadata[key]] for v in metadata[key]: - if v in source_metadata[key]: - continue + if not isinstance(source_metadata[key], list): + if v in [source_metadata[key]]: + continue + else: + if v in source_metadata[key]: + continue if not isinstance(source_metadata[key], list): prepared_metadata[key] = [source_metadata[key]] else: diff -Nru python-internetarchive-1.8.1/internetarchive/__init__.py python-internetarchive-1.8.5/internetarchive/__init__.py --- python-internetarchive-1.8.1/internetarchive/__init__.py 2018-06-28 19:18:10.000000000 -0400 +++ python-internetarchive-1.8.5/internetarchive/__init__.py 2019-06-07 17:28:42.000000000 -0400 @@ -2,7 +2,7 @@ # # The internetarchive module is a Python/CLI interface to Archive.org. # -# Copyright (C) 2012-2017 Internet Archive +# Copyright (C) 2012-2019 Internet Archive # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU Affero General Public License as @@ -30,17 +30,17 @@ >>> item.exists True -:copyright: (C) 2012-2017 by Internet Archive. +:copyright: (C) 2012-2019 by Internet Archive. :license: AGPL 3, see LICENSE for more details. """ from __future__ import absolute_import __title__ = 'internetarchive' -__version__ = '1.8.0' +__version__ = '1.8.5' __author__ = 'Jacob M. Johnson' __license__ = 'AGPL 3' -__copyright__ = 'Copyright (C) 2012-2017 Internet Archive' +__copyright__ = 'Copyright (C) 2012-2019 Internet Archive' from internetarchive.item import Item from internetarchive.files import File diff -Nru python-internetarchive-1.8.1/internetarchive/item.py python-internetarchive-1.8.5/internetarchive/item.py --- python-internetarchive-1.8.1/internetarchive/item.py 2018-06-28 19:18:10.000000000 -0400 +++ python-internetarchive-1.8.5/internetarchive/item.py 2019-06-07 17:28:42.000000000 -0400 @@ -2,7 +2,7 @@ # # The internetarchive module is a Python/CLI interface to Archive.org. # -# Copyright (C) 2012-2017 Internet Archive +# Copyright (C) 2012-2019 Internet Archive # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU Affero General Public License as @@ -21,7 +21,7 @@ internetarchive.item ~~~~~~~~~~~~~~~~~~~~ -:copyright: (C) 2012-2017 by Internet Archive. +:copyright: (C) 2012-2019 by Internet Archive. :license: AGPL 3, see LICENSE for more details. """ from __future__ import absolute_import, unicode_literals, print_function @@ -283,7 +283,8 @@ ignore_errors=None, on_the_fly=None, return_responses=None, - no_change_timestamp=None): + no_change_timestamp=None, + params=None): """Download files from an item. :param files: (optional) Only download files matching given file names. @@ -345,6 +346,10 @@ current time instead of changing it to that given in the original archive. + :type params: dict + :param params: (optional) URL parameters to send with + download request (e.g. `cnt=0`). + :rtype: bool :returns: True if if all files have been downloaded successfully. """ @@ -357,6 +362,7 @@ no_directory = False if no_directory is None else no_directory return_responses = False if not return_responses else True no_change_timestamp = False if not no_change_timestamp else no_change_timestamp + params = None if not params else params if not dry_run: if item_index and verbose is True: @@ -415,7 +421,7 @@ continue r = f.download(path, verbose, silent, ignore_existing, checksum, destdir, retries, ignore_errors, None, return_responses, - no_change_timestamp) + no_change_timestamp, params) if return_responses: responses.append(r) if r is False: @@ -473,7 +479,6 @@ :returns: A dictionary containing the status_code and response returned from the Metadata API. """ - target = 'metadata' if target is None else target append = False if append is None else append access_key = self.session.access_key if not access_key else access_key secret_key = self.session.secret_key if not secret_key else secret_key @@ -483,12 +488,15 @@ url = '{protocol}//archive.org/metadata/{identifier}'.format( protocol=self.session.protocol, identifier=self.identifier) + # TODO: currently files and metadata targets do not support dict's, + # but they might someday?? refactor this check. + source_metadata = self.item_metadata request = MetadataRequest( method='POST', url=url, metadata=metadata, headers=self.session.headers, - source_metadata=self.item_metadata.get(target.split('/')[0], {}), + source_metadata=source_metadata, target=target, priority=priority, access_key=access_key, @@ -729,6 +737,7 @@ body.close() os.remove(filename) body.close() + response.close() return response except HTTPError as exc: body.close() @@ -758,12 +767,10 @@ """Upload files to an item. The item will be created if it does not exist. - :type files: list + :type files: str, file, list, tuple, dict :param files: The filepaths or file-like objects to upload. - :type kwargs: dict - :param kwargs: The keyword arguments from the call to - upload_file(). + :param \*\*kwargs: Optional arguments that :func:`Item.upload_file()` takes. Usage:: @@ -771,10 +778,32 @@ >>> item = internetarchive.Item('identifier') >>> md = dict(mediatype='image', creator='Jake Johnson') >>> item.upload('/path/to/image.jpg', metadata=md, queue_derive=False) - True + [<Response [200]>] + + Uploading multiple files:: + + >>> r = item.upload(['file1.txt', 'file2.txt']) + >>> r = item.upload([fileobj, fileobj2]) + >>> r = item.upload(('file1.txt', 'file2.txt')) + + Uploading file objects: + + >>> import io + >>> f = io.BytesIO(b"some initial binary data: \\x00\\x01") + >>> r = item.upload({'remote-name.txt': f}) + >>> f = io.BytesIO(b"some more binary data: \\x00\\x01") + >>> f.name = 'remote-name.txt' + >>> r = item.upload(f) + + *Note: file objects must either have a name attribute, or be uploaded in a + dict where the key is the remote-name* + + Setting the remote filename with a dict:: + + >>> r = item.upload({'remote-name.txt': '/path/to/local/file.txt'}) :rtype: list - :returns: A list of requests.Response objects. + :returns: A list of :class:`requests.Response` objects. """ queue_derive = True if queue_derive is None else queue_derive remote_dir_name = None diff -Nru python-internetarchive-1.8.1/internetarchive/search.py python-internetarchive-1.8.5/internetarchive/search.py --- python-internetarchive-1.8.1/internetarchive/search.py 2018-06-28 19:18:10.000000000 -0400 +++ python-internetarchive-1.8.5/internetarchive/search.py 2019-06-07 17:28:42.000000000 -0400 @@ -2,7 +2,7 @@ # # The internetarchive module is a Python/CLI interface to Archive.org. # -# Copyright (C) 2012-2017 Internet Archive +# Copyright (C) 2012-2019 Internet Archive # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU Affero General Public License as @@ -24,7 +24,7 @@ This module provides objects for interacting with the Archive.org search engine. -:copyright: (C) 2012-2017 by Internet Archive. +:copyright: (C) 2012-2019 by Internet Archive. :license: AGPL 3, see LICENSE for more details. """ from __future__ import absolute_import, unicode_literals @@ -92,7 +92,7 @@ # Set timeout. if 'timeout' not in self.request_kwargs: - self.request_kwargs['timeout'] = 24 + self.request_kwargs['timeout'] = 300 # Set retries. self.session.mount_http_adapter(max_retries=self.max_retries) diff -Nru python-internetarchive-1.8.1/internetarchive/session.py python-internetarchive-1.8.5/internetarchive/session.py --- python-internetarchive-1.8.1/internetarchive/session.py 2018-06-28 19:18:10.000000000 -0400 +++ python-internetarchive-1.8.5/internetarchive/session.py 2019-06-07 17:28:42.000000000 -0400 @@ -2,7 +2,7 @@ # # The internetarchive module is a Python/CLI interface to Archive.org. # -# Copyright (C) 2012-2017 Internet Archive +# Copyright (C) 2012-2019 Internet Archive # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU Affero General Public License as @@ -24,7 +24,7 @@ This module provides an ArchiveSession object to manage and persist settings across the internetarchive package. -:copyright: (C) 2012-2017 by Internet Archive. +:copyright: (C) 2012-2019 by Internet Archive. :license: AGPL 3, see LICENSE for more details. """ from __future__ import absolute_import, unicode_literals @@ -40,12 +40,14 @@ from requests.utils import default_headers from requests.adapters import HTTPAdapter from requests.packages.urllib3 import Retry +from six.moves.urllib.parse import urlparse from internetarchive import __version__ from internetarchive.config import get_config from internetarchive.item import Item, Collection from internetarchive.search import Search -from internetarchive.catalog import Catalog +from internetarchive.catalog import Catalog, CatalogTask +from internetarchive.utils import reraise_modify logger = logging.getLogger(__name__) @@ -118,7 +120,7 @@ if debug or (logger.level <= 10): self.set_file_logger(logging_config.get('level', 'NOTSET'), logging_config.get('file', 'internetarchive.log'), - 'requests.packages.urllib3') + 'urllib3') def _get_user_agent_string(self): """Generate a User-Agent string to be sent with every request.""" @@ -131,6 +133,14 @@ return 'internetarchive/{0} ({1} {2}; N; {3}; {4}) Python/{5}'.format( __version__, uname[0], uname[-1], lang, self.access_key, py_version) + def rebuild_auth(self, prepared_request, response): + """Never rebuild auth for archive.org URLs. + """ + u = urlparse(prepared_request.url) + if u.netloc.endswith('archive.org'): + return + super(ArchiveSession, self).rebuild_auth(prepared_request, response) + def mount_http_adapter(self, protocol=None, max_retries=None, status_forcelist=None, host=None): """Mount an HTTP adapter to the @@ -287,6 +297,9 @@ request_kwargs=request_kwargs, max_retries=max_retries) + def get_task_log(self, task_id, request_kwargs=None): + return CatalogTask.get_task_log(task_id, self, request_kwargs) + def get_tasks(self, identifier=None, task_id=None, @@ -367,7 +380,14 @@ insecure = False with warnings.catch_warnings(record=True) as w: warnings.filterwarnings('always') - r = super(ArchiveSession, self).send(request, **kwargs) + try: + r = super(ArchiveSession, self).send(request, **kwargs) + except Exception as e: + try: + reraise_modify(e, e.request.url, prepend=False) + except: + logger.error(e) + raise e if self.protocol == 'http:': return r insecure_warnings = ['SNIMissingWarning', 'InsecurePlatformWarning'] diff -Nru python-internetarchive-1.8.1/internetarchive/utils.py python-internetarchive-1.8.5/internetarchive/utils.py --- python-internetarchive-1.8.1/internetarchive/utils.py 2018-06-28 19:18:10.000000000 -0400 +++ python-internetarchive-1.8.5/internetarchive/utils.py 2019-06-07 17:28:42.000000000 -0400 @@ -2,7 +2,7 @@ # # The internetarchive module is a Python/CLI interface to Archive.org. # -# Copyright (C) 2012-2017 Internet Archive +# Copyright (C) 2012-2019 Internet Archive # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU Affero General Public License as @@ -23,7 +23,7 @@ This module provides utility functions for the internetarchive library. -:copyright: (C) 2012-2017 by Internet Archive. +:copyright: (C) 2012-2019 by Internet Archive. :license: AGPL 3, see LICENSE for more details. """ import hashlib @@ -260,3 +260,85 @@ return os.path.isdir(obj) except TypeError as exc: return False + + +def reraise_modify(caught_exc, append_msg, prepend=False): + """Append message to exception while preserving attributes. + + Preserves exception class, and exception traceback. + + Note: + This function needs to be called inside an except because + `sys.exc_info()` requires the exception context. + + Args: + caught_exc(Exception): The caught exception object + append_msg(str): The message to append to the caught exception + prepend(bool): If True prepend the message to args instead of appending + + Returns: + None + + Side Effects: + Re-raises the exception with the preserved data / trace but + modified message + """ + ExceptClass = type(caught_exc) + # Keep old traceback + traceback = sys.exc_info()[2] + if not caught_exc.args: + # If no args, create our own tuple + arg_list = [append_msg] + else: + # Take the last arg + # If it is a string + # append your message. + # Otherwise append it to the + # arg list(Not as pretty) + arg_list = list(caught_exc.args[:-1]) + last_arg = caught_exc.args[-1] + if isinstance(last_arg, str): + if prepend: + arg_list.append(append_msg + last_arg) + else: + arg_list.append(last_arg + append_msg) + else: + arg_list += [last_arg, append_msg] + caught_exc.args = tuple(arg_list) + six.reraise(ExceptClass, + caught_exc, + traceback) + + +def remove_none(obj): + if isinstance(obj, (list, tuple, set)): + l = type(obj)(remove_none(x) for x in obj if x) + try: + return [dict(t) for t in {tuple(sorted(d.items())) for d in l}] + except (AttributeError, TypeError): + return l + elif isinstance(obj, dict): + return type(obj)((remove_none(k), remove_none(v)) + for k, v in obj.items() if k is not None and v) + else: + return obj + + +def delete_items_from_dict(d, to_delete): + """Recursively deletes items from a dict, + if the item's value(s) is in ``to_delete``. + """ + if not isinstance(to_delete, list): + to_delete = [to_delete] + if isinstance(d, dict): + for single_to_delete in set(to_delete): + if single_to_delete in d.values(): + for k, v in d.copy().items(): + if v == single_to_delete: + del d[k] + for k, v in d.items(): + delete_items_from_dict(v, to_delete) + elif isinstance(d, list): + for i in d: + delete_items_from_dict(i, to_delete) + return remove_none(d) diff -Nru python-internetarchive-1.8.1/Makefile python-internetarchive-1.8.5/Makefile --- python-internetarchive-1.8.1/Makefile 2018-06-28 19:18:10.000000000 -0400 +++ python-internetarchive-1.8.5/Makefile 2019-06-07 17:28:42.000000000 -0400 @@ -31,7 +31,9 @@ binary: # This requires using https://github.com/jjjake/pex which has been hacked for multi-platform support. - pex . --python python3.6 --python python2 --python-shebang='/usr/bin/env python' -e internetarchive.cli.ia:main -o ia-$(VERSION)-py2.py3-none-any.pex + pex . --python python3.7 --python python2 --python-shebang='/usr/bin/env python' --platform=linux-x86_64 --platform=macosx_10_11 -e internetarchive.cli.ia:main -o ia-$(VERSION)-py2.py3-none-any.pex -r pex-requirements.txt # make with py2??? + # Use pex==1.4.0 + #pex . --python python3 --python /usr/bin/python --python-shebang='/usr/bin/env python' --platform=linux-x86_64 --platform=macosx_10_11 -e internetarchive.cli.ia:main -o ia-$(VERSION)-py2.py3-none-any.pex -f wheelhouse/ --no-pypi publish-binary: ./ia-$(VERSION)-py2.py3-none-any.pex upload ia-pex ia-$(VERSION)-py2.py3-none-any.pex --no-derive diff -Nru python-internetarchive-1.8.1/pex-requirements.txt python-internetarchive-1.8.5/pex-requirements.txt --- python-internetarchive-1.8.1/pex-requirements.txt 1969-12-31 19:00:00.000000000 -0500 +++ python-internetarchive-1.8.5/pex-requirements.txt 2019-06-07 17:28:42.000000000 -0400 @@ -0,0 +1,8 @@ +requests>=2.9.1,<3.0.0 +jsonpatch>=0.4 +docopt>=0.6.0,<0.7.0 +clint>=0.4.0,<0.6.0 +six>=1.0.0,<2.0.0 +schema>=0.4.0 +total-ordering +backports.csv diff -Nru python-internetarchive-1.8.1/README.rst python-internetarchive-1.8.5/README.rst --- python-internetarchive-1.8.1/README.rst 2018-06-28 19:18:10.000000000 -0400 +++ python-internetarchive-1.8.5/README.rst 2019-06-07 17:28:42.000000000 -0400 @@ -11,7 +11,7 @@ This package installs a command-line tool named ``ia`` for using Archive.org from the command-line. It also installs the ``internetarchive`` Python module for programatic access to archive.org. -Please report all bugs and issues on `Github <https://github.com/jjjake/ia-wrapper/issues>`__. +Please report all bugs and issues on `Github <https://github.com/jjjake/internetarchive/issues>`__. Installation @@ -35,10 +35,10 @@ Documentation ------------- -Documentation is available at `https://internetarchive.readthedocs.io <https://internetarchive.readthedocs.io>`_. +Documentation is available at `https://archive.org/services/docs/api/internetarchive <https://archive.org/services/docs/api/internetarchive>`_. Contributing ------------ -All contributions are welcome and appreciated. Please see `https://internetarchive.readthedocs.io/en/latest/contributing.html <https://internetarchive.readthedocs.io/en/latest/contributing.html>`_ for more details. +All contributions are welcome and appreciated. Please see `https://archive.org/services/docs/api/internetarchive/contributing.html <https://archive.org/services/docs/api/internetarchive/contributing.html>`_ for more details. diff -Nru python-internetarchive-1.8.1/setup.py python-internetarchive-1.8.5/setup.py --- python-internetarchive-1.8.1/setup.py 2018-06-28 19:18:10.000000000 -0400 +++ python-internetarchive-1.8.5/setup.py 2019-06-07 17:28:42.000000000 -0400 @@ -43,8 +43,12 @@ 'clint>=0.4.0,<0.6.0', 'six>=1.0.0,<2.0.0', 'schema>=0.4.0', - ] + (['total-ordering'] if sys.version_info < (2, 7) else []) + - (['backports.csv'] if sys.version_info < (3, 0) else []), + 'backports.csv < 1.07;python_version<"2.7"', + 'backports.csv < 1.07;python_version<"3.4"', + 'backports.csv;python_version>="2.7"', + 'backports.csv;python_version>="3.4"', + 'total-ordering;python_version<"2.7"', + ], classifiers=[ 'Development Status :: 5 - Production/Stable', 'Intended Audience :: Developers', diff -Nru python-internetarchive-1.8.1/tests/cli/test_ia_download.py python-internetarchive-1.8.5/tests/cli/test_ia_download.py --- python-internetarchive-1.8.1/tests/cli/test_ia_download.py 2018-06-28 19:18:10.000000000 -0400 +++ python-internetarchive-1.8.5/tests/cli/test_ia_download.py 2019-06-07 17:28:42.000000000 -0400 @@ -34,7 +34,8 @@ expected_files = set([ 'globe_west_540.jpg', 'NASAarchiveLogo.jpg', - 'globe_west_540_thumb.jpg' + 'globe_west_540_thumb.jpg', + '__ia_thumb.jpg', ]) call_cmd('ia --insecure download --glob="*jpg" nasa') diff -Nru python-internetarchive-1.8.1/tests/cli/test_ia_metadata.py python-internetarchive-1.8.5/tests/cli/test_ia_metadata.py --- python-internetarchive-1.8.1/tests/cli/test_ia_metadata.py 2018-06-28 19:18:10.000000000 -0400 +++ python-internetarchive-1.8.5/tests/cli/test_ia_metadata.py 2019-06-07 17:28:42.000000000 -0400 @@ -9,9 +9,10 @@ def test_ia_metadata_exists(capsys): with IaRequestsMock() as rsps: rsps.add_metadata_mock('nasa') - ia_call(['ia', 'metadata', '--exists', 'nasa']) + ia_call(['ia', 'metadata', '--exists', 'nasa'], expected_exit_code=0) out, err = capsys.readouterr() assert out == 'nasa exists\n' + rsps.reset() rsps.add_metadata_mock('nasa', '{}') sys.argv = ['ia', 'metadata', '--exists', 'nasa'] ia_call(['ia', 'metadata', '--exists', 'nasa'], expected_exit_code=1) diff -Nru python-internetarchive-1.8.1/tests/cli/test_ia_upload.py python-internetarchive-1.8.5/tests/cli/test_ia_upload.py --- python-internetarchive-1.8.1/tests/cli/test_ia_upload.py 2018-06-28 19:18:10.000000000 -0400 +++ python-internetarchive-1.8.5/tests/cli/test_ia_upload.py 2019-06-07 17:28:42.000000000 -0400 @@ -36,6 +36,7 @@ j = json.loads(STATUS_CHECK_RESPONSE) j['over_limit'] = 1 + rsps.reset() rsps.add(responses.GET, '{0}//s3.us.archive.org'.format(PROTOCOL), body=json.dumps(j), content_type='application/json') diff -Nru python-internetarchive-1.8.1/tests/conftest.py python-internetarchive-1.8.5/tests/conftest.py --- python-internetarchive-1.8.1/tests/conftest.py 2018-06-28 19:18:10.000000000 -0400 +++ python-internetarchive-1.8.5/tests/conftest.py 2019-06-07 17:28:42.000000000 -0400 @@ -38,7 +38,8 @@ 'nasa_meta.xml', 'nasa_reviews.xml', 'NASAarchiveLogo.jpg', - 'globe_west_540_thumb.jpg' + 'globe_west_540_thumb.jpg', + '__ia_thumb.jpg', ]) diff -Nru python-internetarchive-1.8.1/tests/requirements.txt python-internetarchive-1.8.5/tests/requirements.txt --- python-internetarchive-1.8.1/tests/requirements.txt 2018-06-28 19:18:10.000000000 -0400 +++ python-internetarchive-1.8.5/tests/requirements.txt 2019-06-07 17:28:42.000000000 -0400 @@ -1,3 +1,3 @@ pytest>=3.3.1 pytest-pep8 -responses==0.5.0 +responses==0.10.6 diff -Nru python-internetarchive-1.8.1/tests/test_api.py python-internetarchive-1.8.5/tests/test_api.py --- python-internetarchive-1.8.1/tests/test_api.py 2018-06-28 19:18:10.000000000 -0400 +++ python-internetarchive-1.8.5/tests/test_api.py 2019-06-07 17:28:42.000000000 -0400 @@ -178,12 +178,12 @@ def test_modify_metadata(): with IaRequestsMock(assert_all_requests_are_fired=False) as rsps: - rsps.add(responses.GET, '{0}//archive.org/metadata/test'.format(PROTOCOL), - body='{}') - rsps.add(responses.POST, '{0}//archive.org/metadata/test'.format(PROTOCOL), + rsps.add(responses.GET, '{0}//archive.org/metadata/nasa'.format(PROTOCOL), + body='{"metadata":{"title":"foo"}}') + rsps.add(responses.POST, '{0}//archive.org/metadata/nasa'.format(PROTOCOL), body=('{"success":true,"task_id":423444944,' '"log":"https://catalogd.archive.org/log/423444944"}')) - r = modify_metadata('test', dict(foo=1)) + r = modify_metadata('nasa', dict(foo=1)) assert r.status_code == 200 assert r.json() == { 'task_id': 423444944, diff -Nru python-internetarchive-1.8.1/tests/test_config.py python-internetarchive-1.8.5/tests/test_config.py --- python-internetarchive-1.8.1/tests/test_config.py 2018-06-28 19:18:10.000000000 -0400 +++ python-internetarchive-1.8.5/tests/test_config.py 2019-06-07 17:28:42.000000000 -0400 @@ -17,62 +17,40 @@ @responses.activate def test_get_auth_config(): - headers = {'set-cookie': '[email protected]', - 'set-cookie2': 'logged-in-sig=test-sig; version=0'} - # set-cookie2: Ugly hack to workaround responses lack of support for multiple headers - responses.add(responses.POST, 'https://archive.org/account/login.php', - adding_headers=headers) - test_body = """{ - "key": { - "s3secretkey": "test-secret", - "s3accesskey": "test-access" + "success": true, + "values": { + "cookies": { + "logged-in-sig": "foo-sig", + "logged-in-user": "foo%40example.com" + }, + "email": "[email protected]", + "itemname": "@jakej", + "s3": { + "access": "Ac3ssK3y", + "secret": "S3cretK3y" + }, + "screenname":"jakej" }, - "screenname": "foo", - "success": 1 - }""" - responses.add(responses.GET, 'https://archive.org/account/s3.php', - body=test_body, adding_headers=headers, - content_type='application/json') - responses.add(responses.GET, 'https://s3.us.archive.org', - body=test_body, adding_headers=headers, - content_type='application/json') - - class UglyHack(httplib.HTTPResponse): - def __init__(self, headers): - self.fp = True - if six.PY2: - self.msg = httplib.HTTPMessage(StringIO()) - else: - self.msg = httplib.HTTPMessage() - for (k, v) in headers.items(): - self.msg[k] = v - - original_func = requests.adapters.HTTPAdapter.build_response - - def ugly_hack_build_response(self, req, resp): - resp._original_response = UglyHack(resp.getheaders()) - response = original_func(self, req, resp) - return response - - ugly_hack = mock.patch('requests.adapters.HTTPAdapter.build_response', - ugly_hack_build_response) - ugly_hack.start() + "version": 1}""" + responses.add(responses.POST, 'https://archive.org/services/xauthn/', + body=test_body) r = internetarchive.config.get_auth_config('[email protected]', 'password1') - ugly_hack.stop() - assert r['s3']['access'] == 'test-access' - assert r['s3']['secret'] == 'test-secret' - assert r['cookies']['logged-in-user'] == '[email protected]' - assert r['cookies']['logged-in-sig'] == 'test-sig' + assert r['s3']['access'] == 'Ac3ssK3y' + assert r['s3']['secret'] == 'S3cretK3y' + assert r['cookies']['logged-in-user'] == 'foo%40example.com' + assert r['cookies']['logged-in-sig'] == 'foo-sig' @responses.activate def test_get_auth_config_auth_fail(): # No logged-in-sig cookie set raises AuthenticationError. - responses.add(responses.POST, 'https://archive.org/account/login.php') + responses.add(responses.POST, 'https://archive.org/services/xauthn/', + body='{"error": "failed"}') try: - internetarchive.config.get_auth_config('[email protected]', 'password1') + r = internetarchive.config.get_auth_config('[email protected]', 'password1') except AuthenticationError as exc: + return assert str(exc) == ('Authentication failed. Please check your credentials ' 'and try again.') diff -Nru python-internetarchive-1.8.1/tests/test_item.py python-internetarchive-1.8.5/tests/test_item.py --- python-internetarchive-1.8.1/tests/test_item.py 2018-06-28 19:18:10.000000000 -0400 +++ python-internetarchive-1.8.5/tests/test_item.py 2019-06-07 17:28:42.000000000 -0400 @@ -147,6 +147,7 @@ with IaRequestsMock() as rsps: rsps.add(responses.GET, DOWNLOAD_URL_RE, body='test content') nasa_item.download(files='nasa_meta.xml') + rsps.reset() with pytest.raises(ConnectionError): nasa_item.download(files='nasa_meta.xml') @@ -179,6 +180,7 @@ rsps.add(responses.GET, DOWNLOAD_URL_RE, body='test content') nasa_item.download(files='nasa_meta.xml') + rsps.reset() rsps.add(responses.GET, DOWNLOAD_URL_RE, body='new test content') nasa_item.download(files='nasa_meta.xml') load_file('nasa/nasa_meta.xml') == 'new test content' @@ -200,6 +202,7 @@ assert load_file('nasa/nasa_meta.xml') == 'overwrite based on md5' # test no overwrite based on checksum. + rsps.reset() rsps.add(responses.GET, DOWNLOAD_URL_RE, body=load_test_data_file('nasa_meta.xml')) nasa_item.download(files='nasa_meta.xml', checksum=True)

