Script 'mail_helper' called by obssrc Hello community, here is the log from the commit of package urlwatch for openSUSE:Factory checked in at 2021-01-30 13:57:11 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Comparing /work/SRC/openSUSE:Factory/urlwatch (Old) and /work/SRC/openSUSE:Factory/.urlwatch.new.28504 (New) ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Package is "urlwatch" Sat Jan 30 13:57:11 2021 rev:20 rq:867912 version:2.22 Changes: -------- --- /work/SRC/openSUSE:Factory/urlwatch/urlwatch.changes 2020-08-20 22:33:17.076106363 +0200 +++ /work/SRC/openSUSE:Factory/.urlwatch.new.28504/urlwatch.changes 2021-01-30 13:58:04.326427885 +0100 @@ -1,0 +2,52 @@ +Mon Jan 4 10:59:57 UTC 2021 - Michael Vetter <[email protected]> + +- Update to 2.22: + Added: + * Added 'wait_until' option to browser jobs to configure how long + the headless browser will wait for pages to load. + * Jobs now have an optional treat_new_as_changed (default false) + key that can be set, and will treat newly-found pages as changed, + and display a diff from the empty string (useful for diff_tool + or diff_filter with side effects) + * New reporters: discord, mattermost + * New key user_visible_url for URL jobs that can be used to show a + different URL in reports (useful if the watched URL is a REST API + endpoint, but the report should link to the corresponding web page) + * The Markdown reporter now supports limiting the report length via + the max_length parameter of the submit method. The length limiting + logic is smart in the sense that it will try trimming the details first, + followed by omitting them completely, followed by omitting the summary. + If a part of the report is omitted, a note about this is added to the + report. (PR#572, by Denis Kasak) + Changed: + * Diff output is now generated more uniformly, independent of whether + the input data has a trailing newline or not; if this behavior is not + intended, use an external diff_tool (PR#550, by Adam Goldsmith) + * The --test-diff-filter output now properly reports timestamps from the + history entry instead of the current date and time (Fixes #573) + * Unique GUIDs for jobs are now enforced at load time, append "#1", + "#2", ... to the URLs to make them unique if you have multiple different + jobs that share the same request URL (Fixes #586) + * When a config, urls file or hooks file does not exist and should be + edited or inited, its parent folders will be created (previously only + the urlwatch configuration folder was created; Fixes #594) + * Auto-matched filters now always get None supplied as subfilter; any + custom filters must accept a subfilter parameter after the existing + data parameter + * Drop support for Python 3.5 + Fixed: + * Make imports thread-safe: This might increase startup times a bit, + as dependencies are imported on bootup instead of when first used. + Importing in Python is not (yet) thread-safe, so we cannot import + new modules from the worker threads reliably (Fixes #559, #601) + * The Matrix reporter was improved in several ways (PR#572, by Denis Kasak): + - The maximum length of the report was increase from 4096 to 16384. + - The report length limiting is now implemented via the new length + limiting functionality of the Markdown reporter. Previously, the + report was simply trimmed at the end which could break the diff + blocks and make them render incorrectly. + - The diff code blocks are now tagged as diffs which will allow the + diffs to be syntax highlighted as such. This doesn't yet work in + Element, pending on the resolution of trentm/python-markdown2#370. + +------------------------------------------------------------------- Old: ---- urlwatch-2.21.tar.gz New: ---- urlwatch-2.22.tar.gz ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Other differences: ------------------ ++++++ urlwatch.spec ++++++ --- /var/tmp/diff_new_pack.cScTJr/_old 2021-01-30 13:58:04.942428924 +0100 +++ /var/tmp/diff_new_pack.cScTJr/_new 2021-01-30 13:58:04.946428930 +0100 @@ -1,7 +1,7 @@ # # spec file for package urlwatch # -# Copyright (c) 2020 SUSE LLC +# Copyright (c) 2021 SUSE LLC # # All modifications and additions to the file contributed by third parties # remain the property of their copyright owners, unless otherwise agreed @@ -17,14 +17,14 @@ Name: urlwatch -Version: 2.21 +Version: 2.22 Release: 0 Summary: A tool for monitoring webpages for updates License: BSD-3-Clause Group: Productivity/Networking/Web/Utilities URL: https://thp.io/2008/urlwatch/ Source0: https://github.com/thp/%{name}/archive/%{version}.tar.gz#/%{name}-%{version}.tar.gz -BuildRequires: python3-devel >= 3.5 +BuildRequires: python3-devel >= 3.6 BuildRequires: python3-setuptools Requires: python3-PyYAML Requires: python3-appdirs ++++++ urlwatch-2.21.tar.gz -> urlwatch-2.22.tar.gz ++++++ diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/urlwatch-2.21/.gitignore new/urlwatch-2.22/.gitignore --- old/urlwatch-2.21/.gitignore 2020-07-31 07:37:07.000000000 +0200 +++ new/urlwatch-2.22/.gitignore 2020-12-19 12:27:43.000000000 +0100 @@ -1,3 +1,4 @@ __pycache__ .idea -build \ No newline at end of file +build +*.egg-info/ diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/urlwatch-2.21/.travis.yml new/urlwatch-2.22/.travis.yml --- old/urlwatch-2.21/.travis.yml 2020-07-31 07:37:07.000000000 +0200 +++ new/urlwatch-2.22/.travis.yml 2020-12-19 12:27:43.000000000 +0100 @@ -1,7 +1,6 @@ language: python cache: pip python: - - "3.5" - "3.6" - "3.7" - "3.8" diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/urlwatch-2.21/CHANGELOG.md new/urlwatch-2.22/CHANGELOG.md --- old/urlwatch-2.21/CHANGELOG.md 2020-07-31 07:37:07.000000000 +0200 +++ new/urlwatch-2.22/CHANGELOG.md 2020-12-19 12:27:43.000000000 +0100 @@ -4,6 +4,63 @@ The format mostly follows [Keep a Changelog](http://keepachangelog.com/en/1.0.0/). +## [2.22] -- 2020-12-19 + +### Added + +- Added 'wait_until' option to browser jobs to configure how long + the headless browser will wait for pages to load. +- Jobs now have an optional `treat_new_as_changed` (default `false`) + key that can be set, and will treat newly-found pages as changed, + and display a diff from the empty string (useful for `diff_tool` + or `diff_filter` with side effects) +- New reporters: `discord`, `mattermost` +- New key `user_visible_url` for URL jobs that can be used to show + a different URL in reports (useful if the watched URL is a REST API + endpoint, but the report should link to the corresponding web page) +- The Markdown reporter now supports limiting the report length via the + `max_length` parameter of the `submit` method. The length limiting logic is + smart in the sense that it will try trimming the details first, followed by + omitting them completely, followed by omitting the summary. If a part of the + report is omitted, a note about this is added to the report. (PR#572, by + Denis Kasak) + +### Changed + +- Diff output is now generated more uniformly, independent of whether + the input data has a trailing newline or not; if this behavior is not + intended, use an external `diff_tool` (PR#550, by Adam Goldsmith) +- The `--test-diff-filter` output now properly reports timestamps from + the history entry instead of the current date and time (Fixes #573) +- Unique GUIDs for jobs are now enforced at load time, append "#1", + "#2", ... to the URLs to make them unique if you have multiple + different jobs that share the same request URL (Fixes #586) +- When a config, urls file or hooks file does not exist and should be + edited or inited, its parent folders will be created (previously + only the urlwatch configuration folder was created; Fixes #594) +- Auto-matched filters now always get `None` supplied as subfilter; + any custom filters must accept a `subfilter` parameter after the + existing `data` parameter +- Drop support for Python 3.5 + +## Fixed + +- Make imports thread-safe: This might increase startup times a bit, + as dependencies are imported on bootup instead of when first used. + Importing in Python is not (yet) thread-safe, so we cannot import + new modules from the worker threads reliably (Fixes #559, #601) + +- The Matrix reporter was improved in several ways (PR#572, by Denis Kasak): + + - The maximum length of the report was increase from 4096 to 16384. + - The report length limiting is now implemented via the new length limiting + functionality of the Markdown reporter. Previously, the report was simply + trimmed at the end which could break the diff blocks and make them render + incorrectly. + - The diff code blocks are now tagged as diffs which will allow the diffs to + be syntax highlighted as such. This doesn't yet work in Element, pending on + the resolution of trentm/python-markdown2#370. + ## [2.21] -- 2020-07-31 ### Added @@ -191,7 +248,7 @@ ### Added - Support for Mailgun regions (by Daniel Peukert, PR#280) -- CLI: Allow multiple occurences of 'filter' when adding jobs (PR#278) +- CLI: Allow multiple occurrences of 'filter' when adding jobs (PR#278) ### Changed - Fixed incorrect name for chat_id config in the default config (by Robin B, PR#276) diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/urlwatch-2.21/Dockerfile new/urlwatch-2.22/Dockerfile --- old/urlwatch-2.21/Dockerfile 2020-07-31 07:37:07.000000000 +0200 +++ new/urlwatch-2.22/Dockerfile 2020-12-19 12:27:43.000000000 +0100 @@ -1,6 +1,6 @@ FROM python:3.8.2 -RUN python3 -m pip install pyyaml minidb requests keyring appdirs lxml cssselect beautifulsoup4 jsbeautifier cssbeautifier aioxmpp +RUN python3 -m pip install --no-cache-dir pyyaml minidb requests keyring appdirs lxml cssselect beautifulsoup4 jsbeautifier cssbeautifier aioxmpp WORKDIR /opt/urlwatch diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/urlwatch-2.21/docs/source/advanced.rst new/urlwatch-2.22/docs/source/advanced.rst --- old/urlwatch-2.21/docs/source/advanced.rst 2020-07-31 07:37:07.000000000 +0200 +++ new/urlwatch-2.22/docs/source/advanced.rst 2020-12-19 12:27:43.000000000 +0100 @@ -258,3 +258,57 @@ select a region of a web page. It then generates a configuration for ``urlwatch`` to run ``pyvisualcompare`` and generate a hash for the screen contents. + + +Configuring how long browser jobs wait for pages to load +-------------------------------------------------------- + +For browser jobs, you can configure how long the headless browser will wait +before a page is considered loaded by using the `wait_until` option. It can take one of four values: + + - `load` will wait until the `load` browser event is fired (default). + - `documentloaded` will wait until the `DOMContentLoaded` browser event is fired. + - `networkidle0` will wait until there are no more than 0 network connections for at least 500 ms. + - `networkidle2` will wait until there are no more than 2 network connections for at least 500 ms. + + +Treating ``NEW`` jobs as ``CHANGED`` +------------------------------------ + +In some cases (e.g. when the ``diff_tool`` or ``diff_filter`` executes some +external command as a side effect that should also run for the initial page +state), you can set the ``treat_new_as_changed`` to ``true``, which will make +the job report as ``CHANGED`` instead of ``NEW`` the first time it is retrieved +(and the diff will be reported, too). + +.. code-block:: yaml + + url: http://example.com/initialpage.html + treat_new_as_changed: true + +This option will also change the behavior of ``--test-diff-filter``, and allow +testing the diff filter if only a single version of the page has been +retrieved. + + +Monitoring the same URL in multiple jobs +---------------------------------------- + +Because urlwatch uses the ``url``/``navigate`` (for URL/Browser jobs) and/or +the ``command`` (for Shell jobs) key as unique identifier, each URL can only +appear in a single job. If you want to monitor the same URL multiple times, +you can append ``#1``, ``#2``, ... (or anything that makes them unique) to +the URLs, like this: + +.. code-block:: yaml + + name: "Looking for Thing A" + url: http://example.com/#1 + filter: + - grep: "Thing A" + --- + name: "Looking for Thing B" + url: http://example.com/#2 + filter: + - grep: "Thing B" + diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/urlwatch-2.21/docs/source/conf.py new/urlwatch-2.22/docs/source/conf.py --- old/urlwatch-2.21/docs/source/conf.py 2020-07-31 07:37:07.000000000 +0200 +++ new/urlwatch-2.22/docs/source/conf.py 2020-12-19 12:27:43.000000000 +0100 @@ -22,7 +22,7 @@ author = 'Thomas Perl' # The full version, including alpha/beta/rc tags -release = '2.21' +release = '2.22' # -- General configuration --------------------------------------------------- diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/urlwatch-2.21/docs/source/dependencies.rst new/urlwatch-2.22/docs/source/dependencies.rst --- old/urlwatch-2.21/docs/source/dependencies.rst 2020-07-31 07:37:07.000000000 +0200 +++ new/urlwatch-2.22/docs/source/dependencies.rst 2020-12-19 12:27:43.000000000 +0100 @@ -10,7 +10,7 @@ Mandatory Packages ------------------ -- Python 3.5 or newer +- Python 3.6 or newer - `PyYAML <http://pyyaml.org/>`__ - `minidb <https://thp.io/2010/minidb/>`__ - `requests <http://python-requests.org/>`__ @@ -52,8 +52,8 @@ +-------------------------+---------------------------------------------------------------------+ | Unit testing | `pycodestyle <http://pycodestyle.pycqa.org/en/latest/>`__, | | | `docutils <https://docutils.sourceforge.io>`__, | -| | `Pygments <https://pygments.org>`__ and | -| | dependencies for other features as needed | ++-------------------------+---------------------------------------------------------------------+ +| Documentation build | `Sphinx <https://www.sphinx-doc.org/>`__ | +-------------------------+---------------------------------------------------------------------+ | `beautify` filter | `beautifulsoup4 <https://pypi.org/project/beautifulsoup4/>`__; | | | optional dependencies (for ``<script>`` and ``<style>`` tags): | diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/urlwatch-2.21/docs/source/deprecated.rst new/urlwatch-2.22/docs/source/deprecated.rst --- old/urlwatch-2.21/docs/source/deprecated.rst 2020-07-31 07:37:07.000000000 +0200 +++ new/urlwatch-2.22/docs/source/deprecated.rst 2020-12-19 12:27:43.000000000 +0100 @@ -4,6 +4,40 @@ This page lists the features that are deprecated and steps to update your configuration to use the replacements (if any). + +Filters without subfilters (UNRELEASED) +--------------------------------------- + +In older urlwatch versions, it was possible to write custom +filters that do not take a ``subfilter`` as argument. + +If you have written your own filter code like this: + +.. code:: python + + class CustomFilter(filters.FilterBase): + """My old custom filter""" + + __kind__ = 'foo' + + def filter(self, data): + ... + +You have to update your filter to take an optional subfilter +argument (if the filter configuration does not have a subfilter +defined, the value of ``subfilter`` will be ``None``): + +.. code:: python + + class CustomFilter(filters.FilterBase): + """My new custom filter""" + + __kind__ = 'foo' + + def filter(self, data, subfilter): + ... + + string-based filter definitions (since 2.19) -------------------------------------------- diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/urlwatch-2.21/docs/source/filters.rst new/urlwatch-2.22/docs/source/filters.rst --- old/urlwatch-2.21/docs/source/filters.rst 2020-07-31 07:37:07.000000000 +0200 +++ new/urlwatch-2.22/docs/source/filters.rst 2020-12-19 12:27:43.000000000 +0100 @@ -195,7 +195,7 @@ To match an element in an `XML namespace <https://www.w3.org/TR/xml-names/>`__, use a namespace prefix -before the tag name. Use a ``:`` to seperate the namespace prefix and +before the tag name. Use a ``:`` to separate the namespace prefix and the tag name in an XPath expression, and use a ``|`` in a CSS selector. .. code:: yaml @@ -490,7 +490,7 @@ Within the ``shellpipe`` script, two environment variables will be set for further customization (this can be useful if you have -a external shell script file that is used as filter for multiple +an external shell script file that is used as filter for multiple jobs, but needs to treat each job in a slightly different way): +----------------------------+------------------------------------------------------+ diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/urlwatch-2.21/docs/source/introduction.rst new/urlwatch-2.22/docs/source/introduction.rst --- old/urlwatch-2.21/docs/source/introduction.rst 2020-07-31 07:37:07.000000000 +0200 +++ new/urlwatch-2.22/docs/source/introduction.rst 2020-12-19 12:27:43.000000000 +0100 @@ -13,7 +13,7 @@ :ref:`Jobs` ----------- -Each website or shell command to be monitored consitutes a "job". +Each website or shell command to be monitored constitutes a "job". The instructions for each such job are contained in a config file in the `YAML format`_, accessible with the ``urlwatch --edit`` command. If you get an error, set your ``$EDITOR`` (or ``$VISUAL``) environment @@ -74,6 +74,7 @@ - ``email`` (using SMTP) - email using ``mailgun`` - ``slack`` +- ``discord`` - ``pushbullet`` - ``telegram`` - ``matrix`` diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/urlwatch-2.21/docs/source/jobs.rst new/urlwatch-2.22/docs/source/jobs.rst --- old/urlwatch-2.21/docs/source/jobs.rst 2020-07-31 07:37:07.000000000 +0200 +++ new/urlwatch-2.22/docs/source/jobs.rst 2020-12-19 12:27:43.000000000 +0100 @@ -3,7 +3,7 @@ Jobs ==== -Jobs are the kind of things that `urlwatch` can monitor. +Jobs are the kind of things that `urlwatch` can monitor. The list of jobs to run are contained in the configuration file ``urls.yaml``, accessed with the command ``urlwatch --edit``, each separated by a line @@ -46,6 +46,7 @@ - ``ignore_http_error_codes``: List of HTTP errors to ignore (see :ref:`advanced_topics`) - ``ignore_timeout_errors``: Do not report errors when the timeout is hit - ``ignore_too_many_redirects``: Ignore redirect loops (see :ref:`advanced_topics`) +- ``user_visible_url``: Different URL to show in reports (e.g. when watched URL is a REST API URL, and you want to show a webpage) (Note: ``url`` implies ``kind: url``) @@ -80,7 +81,8 @@ Job-specific optional keys: -- none +- ``wait_until``: Either ``load``, ``domcontentloaded``, ``networkidle0``, or ``networkidle2`` (see :ref:`advanced_topics`) + As this job uses `Pyppeteer <https://github.com/pyppeteer/pyppeteer>`__ to render the page in a headless Chromium instance, it requires massively @@ -98,7 +100,7 @@ ----- This job type allows you to watch the output of arbitrary shell commands, -which is useful for e.g. monitoring a FTP uploader folder, output of +which is useful for e.g. monitoring an FTP uploader folder, output of scripts that query external devices (RPi GPIO), etc... .. code-block:: yaml @@ -125,6 +127,7 @@ - ``max_tries``: Number of times to retry fetching the resource - ``diff_tool``: Command to a custom tool for generating diff text - ``diff_filter``: :ref:`filters` (if any) to apply to the diff result (can be tested with ``--test-diff-filter``) +- ``treat_new_as_changed``: Will treat jobs that don't have any historic data as ``CHANGED`` instead of ``NEW`` (and create a diff for new jobs) - ``compared_versions``: Number of versions to compare for similarity - ``kind`` (redundant): Either ``url``, ``shell`` or ``browser``. Automatically derived from the unique key (``url``, ``command`` or ``navigate``) of the job type diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/urlwatch-2.21/docs/source/migration.rst new/urlwatch-2.22/docs/source/migration.rst --- old/urlwatch-2.21/docs/source/migration.rst 2020-07-31 07:37:07.000000000 +0200 +++ new/urlwatch-2.22/docs/source/migration.rst 2020-12-19 12:27:43.000000000 +0100 @@ -8,7 +8,7 @@ specifying names for jobs, different job kinds, directly applying filters, selecting the HTTP request method, specifying POST data as dictionary and much more -- The cache directory has been replaced with a SQLite 3 database file +- The cache directory has been replaced with an SQLite 3 database file ???cache.db??? in `minidb`_ format, storing all change history (use ``--gc-cache`` to remove old changes if you don???t need them anymore) for further analysis diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/urlwatch-2.21/docs/source/reporters.rst new/urlwatch-2.22/docs/source/reporters.rst --- old/urlwatch-2.21/docs/source/reporters.rst 2020-07-31 07:37:07.000000000 +0200 +++ new/urlwatch-2.22/docs/source/reporters.rst 2020-12-19 12:27:43.000000000 +0100 @@ -45,13 +45,16 @@ - **stdout**: Print summary on stdout (the console) - **email**: Send summary via e-mail / SMTP -- **mailgun**: Custom email reporter that uses Mailgun -- **matrix**: Custom Matrix reporter +- **mailgun**: Send e-mail via the Mailgun service +- **matrix**: Send a message to a room using the Matrix protocol +- **mattermost**: Send a message to a Mattermost channel - **pushbullet**: Send summary via pushbullet.com - **pushover**: Send summary via pushover.net -- **slack**: Custom Slack reporter -- **telegram**: Custom Telegram reporter +- **slack**: Send a message to a Slack channel +- **discord**: Send a message to a Discord channel +- **telegram**: Send a message using Telegram - **ifttt**: Send summary via IFTTT +- **xmpp**: Send a message using the XMPP Protocol .. To convert the "urlwatch --features" output, use: sed -e 's/^ \* \(.*\) - \(.*\)$/- **\1**: \2/' @@ -141,6 +144,39 @@ ???Incoming Webhooks??? on a channel, you???ll get a webhook URL, copy it into the configuration as seen above. +Mattermost +---------- + +Mattermost notifications are set up the same way as Slack notifications, +the webhook URL is different: + +.. code:: yaml + + mattermost: + webhook_url: 'http://{your-mattermost-site}/hooks/XXXXXXXXXXXXXXXXXXXXXX' + enabled: true + +See `Incoming Webooks <https://developers.mattermost.com/integrate/incoming-webhooks/>`__ +in the Mattermost documentation for details. + +Discord +----- + +Discord notifications are configured using ???Discord Incoming Webhooks???. Here +is a sample configuration: + +.. code:: yaml + + discord: + webhook_url: 'https://discordapp.com/api/webhooks/11111XXXXXXXXXXX/BBBBYYYYYYYYYYYYYYYYYYYYYYYyyyYYYYYYYYYYYYYY' + enabled: true + embed: true + subject: '{count} changes: {jobs}' + +To set up Discord, from your Discord Server settings, select Integration and then create a "New Webhook", give the webhook a name to post under, select a channel, push "Copy Webhook URL" and paste it into the configuration as seen above. + +Embedded content might be easier to read and identify individual reports. subject preceeds the embedded report and is only used when embed is true. + IFTTT ----- diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/urlwatch-2.21/lib/urlwatch/__init__.py new/urlwatch-2.22/lib/urlwatch/__init__.py --- old/urlwatch-2.21/lib/urlwatch/__init__.py 2020-07-31 07:37:07.000000000 +0200 +++ new/urlwatch-2.22/lib/urlwatch/__init__.py 2020-12-19 12:27:43.000000000 +0100 @@ -12,5 +12,5 @@ __author__ = 'Thomas Perl <[email protected]>' __license__ = 'BSD' __url__ = 'https://thp.io/2008/urlwatch/' -__version__ = '2.21' +__version__ = '2.22' __user_agent__ = '%s/%s (+https://thp.io/2008/urlwatch/info.html)' % (pkgname, __version__) diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/urlwatch-2.21/lib/urlwatch/browser.py new/urlwatch-2.22/lib/urlwatch/browser.py --- old/urlwatch-2.21/lib/urlwatch/browser.py 2020-07-31 07:37:07.000000000 +0200 +++ new/urlwatch-2.22/lib/urlwatch/browser.py 2020-12-19 12:27:43.000000000 +0100 @@ -54,16 +54,20 @@ return browser @asyncio.coroutine - def _get_content(self, url): + def _get_content(self, url, wait_until=None): context = yield from self._browser.createIncognitoBrowserContext() page = yield from context.newPage() - yield from page.goto(url) + opts = {} + if wait_until is not None: + opts['waitUntil'] = wait_until + yield from page.goto(url, opts) content = yield from page.content() yield from context.close() return content - def process(self, url): - return asyncio.run_coroutine_threadsafe(self._get_content(url), self._event_loop).result() + def process(self, url, wait_until=None): + coroutine = self._get_content(url, wait_until=wait_until) + return asyncio.run_coroutine_threadsafe(coroutine, self._event_loop).result() def destroy(self): self._event_loop.call_soon_threadsafe(self._event_loop.stop) @@ -86,8 +90,8 @@ BrowserContext._BROWSER_LOOP = BrowserLoop() BrowserContext._BROWSER_REFCNT += 1 - def process(self, url): - return BrowserContext._BROWSER_LOOP.process(url) + def process(self, url, wait_until=None): + return BrowserContext._BROWSER_LOOP.process(url, wait_until=wait_until) def close(self): with BrowserContext._BROWSER_LOCK: @@ -104,13 +108,18 @@ parser = argparse.ArgumentParser(description='Browser handler') parser.add_argument('url', help='URL to retrieve') parser.add_argument('-v', '--verbose', action='store_true', help='show debug output') + parser.add_argument('-w', + '--wait-until', + dest='wait_until', + choices=['load', 'domcontentloaded', 'networkidle0', 'networkidle2'], + help='When to consider a pageload finished') args = parser.parse_args() setup_logger(args.verbose) try: ctx = BrowserContext() - print(ctx.process(args.url)) + print(ctx.process(args.url, wait_until=args.wait_until)) finally: ctx.close() diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/urlwatch-2.21/lib/urlwatch/command.py new/urlwatch-2.22/lib/urlwatch/command.py --- old/urlwatch-2.21/lib/urlwatch/command.py 2020-07-31 07:37:07.000000000 +0200 +++ new/urlwatch-2.22/lib/urlwatch/command.py 2020-12-19 12:27:43.000000000 +0100 @@ -60,6 +60,7 @@ shutil.copy(self.urlwatch_config.hooks, hooks_edit) elif self.urlwatch_config.hooks_py_example is not None and os.path.exists( self.urlwatch_config.hooks_py_example): + os.makedirs(os.path.dirname(hooks_edit) or '.', exist_ok=True) shutil.copy(self.urlwatch_config.hooks_py_example, hooks_edit) edit_file(hooks_edit) import_module_from_source('hooks', hooks_edit) @@ -144,7 +145,12 @@ job = self._get_job(id) history_data = self.urlwatcher.cache_storage.get_history_data(job.get_guid(), 10) - history_data = [key for key, value in sorted(history_data.items(), key=lambda kv: kv[1])] + history_data = sorted(history_data.items(), key=lambda kv: kv[1]) + + if len(history_data) and getattr(job, 'treat_new_as_changed', False): + # Insert empty history entry, so first snapshot is diffed against the empty string + _, first_timestamp = history_data[0] + history_data.insert(0, ('', first_timestamp)) if len(history_data) < 2: print('Not enough historic data available (need at least 2 different snapshots)') @@ -152,8 +158,8 @@ for i in range(len(history_data) - 1): with JobState(self.urlwatcher.cache_storage, job) as job_state: - job_state.old_data = history_data[i] - job_state.new_data = history_data[i + 1] + job_state.old_data, job_state.timestamp = history_data[i] + job_state.new_data, job_state.current_timestamp = history_data[i + 1] print('=== Filtered diff between state {} and state {} ==='.format(i, i + 1)) print(job_state.get_diff()) diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/urlwatch-2.21/lib/urlwatch/filters.py new/urlwatch-2.22/lib/urlwatch/filters.py --- old/urlwatch-2.21/lib/urlwatch/filters.py 2020-07-31 07:37:07.000000000 +0200 +++ new/urlwatch-2.22/lib/urlwatch/filters.py 2020-12-19 12:27:43.000000000 +0100 @@ -46,6 +46,39 @@ from .util import TrackSubClasses, import_module_from_source +from .html2txt import html2text +from .ical2txt import ical2text + +try: + from bs4 import BeautifulSoup +except ImportError: + BeautifulSoup = None + +try: + import jsbeautifier +except ImportError: + jsbeautifier = None + +try: + import cssbeautifier +except ImportError: + cssbeautifier = None + +try: + import pdftotext +except ImportError: + pdftotext = None + +try: + import pytesseract +except ImportError: + pytesseract = None + +try: + from PIL import Image +except ImportError: + Image = None + logger = logging.getLogger(__name__) @@ -82,7 +115,8 @@ filter_instance = filtercls(state.job, state) if filter_instance.match(): logger.info('Auto-applying filter %r to %s', filter_instance, state.job.get_location()) - data = filter_instance.filter(data) + # filters require a subfilter argument + data = filter_instance.filter(data, None) return data @@ -216,7 +250,10 @@ def match(self): return self.hooks is not None - def filter(self, data): + def filter(self, data, subfilter): + if subfilter is not None: + logger.warning('Legacy hooks filter does not have any subfilter -- ignored') + try: result = self.hooks.filter(self.job.get_location(), data) if result is None: @@ -235,14 +272,10 @@ __no_subfilter__ = True def filter(self, data, subfilter): - from bs4 import BeautifulSoup as bs - soup = bs(data, features="lxml") + if BeautifulSoup is None: + raise ImportError('Please install BeautifulSoup') - try: - import jsbeautifier - except ImportError: - logger.info('"jsbeautifier" is not installed, will not beautify <script> tags') - jsbeautifier = None + soup = BeautifulSoup(data, features="lxml") if jsbeautifier is not None: scripts = soup.find_all('script') @@ -250,12 +283,8 @@ if script.string is not None: beautified_js = jsbeautifier.beautify(script.string) script.string = beautified_js - - try: - import cssbeautifier - except ImportError: - logger.info('"cssbeautifier" is not installed, will not beautify <style> tags') - cssbeautifier = None + else: + logger.info('"jsbeautifier" is not installed, will not beautify <script> tags') if cssbeautifier is not None: styles = soup.find_all('style') @@ -263,6 +292,8 @@ if style.string is not None: beautified_css = cssbeautifier.beautify(style.string) style.string = beautified_css + else: + logger.info('"cssbeautifier" is not installed, will not beautify <style> tags') return soup.prettify() @@ -288,7 +319,6 @@ method = 're' options = {} - from .html2txt import html2text return html2text(data, baseurl=getattr(self.job, 'url', getattr(self.job, 'navigate', '')), method=method, options=options) @@ -312,7 +342,9 @@ if not isinstance(data, bytes): raise ValueError('The pdf2text filter needs bytes input (is it the first filter?)') - import pdftotext + if pdftotext is None: + raise ImportError('Please install pdftotext') + return '\n\n'.join(pdftotext.PDF(io.BytesIO(data), password=subfilter.get('password', ''))) @@ -324,7 +356,6 @@ __no_subfilter__ = True def filter(self, data, subfilter): - from .ical2txt import ical2text return ical2text(data) @@ -817,6 +848,10 @@ language = subfilter.get('language', None) timeout = int(subfilter.get('timeout', 10)) - import pytesseract - from PIL import Image + if pytesseract is None: + raise ImportError('Please install pytesseract') + + if Image is None: + raise ImportError('Please install Pillow/PIL') + return pytesseract.image_to_string(Image.open(io.BytesIO(data)), lang=language, timeout=timeout) diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/urlwatch-2.21/lib/urlwatch/handler.py new/urlwatch-2.22/lib/urlwatch/handler.py --- old/urlwatch-2.21/lib/urlwatch/handler.py 2020-07-31 07:37:07.000000000 +0200 +++ new/urlwatch-2.22/lib/urlwatch/handler.py 2020-12-19 12:27:43.000000000 +0100 @@ -55,6 +55,7 @@ self.new_data = None self.history_data = {} self.timestamp = None + self.current_timestamp = None self.exception = None self.traceback = None self.tries = 0 @@ -103,6 +104,12 @@ try: try: self.load() + + if self.old_data is None and getattr(self.job, 'treat_new_as_changed', False): + # Force creation of a diff for "NEW"ly found items by pretending we had an empty page before + self.old_data = '' + self.timestamp = None + data = self.job.retrieve(self) # Apply automatic filters first @@ -160,10 +167,10 @@ raise subprocess.CalledProcessError(proc.returncode, cmdline) timestamp_old = email.utils.formatdate(self.timestamp, localtime=True) - timestamp_new = email.utils.formatdate(time.time(), localtime=True) - return ''.join(difflib.unified_diff(self.old_data.splitlines(keepends=True), - self.new_data.splitlines(keepends=True), - '@', '@', timestamp_old, timestamp_new)) + timestamp_new = email.utils.formatdate(self.current_timestamp or time.time(), localtime=True) + return '\n'.join(difflib.unified_diff(self.old_data.splitlines(), + self.new_data.splitlines(), + '@', '@', timestamp_old, timestamp_new, lineterm='')) class Report(object): diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/urlwatch-2.21/lib/urlwatch/html2txt.py new/urlwatch-2.22/lib/urlwatch/html2txt.py --- old/urlwatch-2.21/lib/urlwatch/html2txt.py 2020-07-31 07:37:07.000000000 +0200 +++ new/urlwatch-2.22/lib/urlwatch/html2txt.py 2020-12-19 12:27:43.000000000 +0100 @@ -35,6 +35,16 @@ logger = logging.getLogger(__name__) +try: + import html2text as pyhtml2text +except ImportError: + pyhtml2text = None + +try: + from bs4 import BeautifulSoup +except ImportError: + BeautifulSoup = None + def html2text(data, baseurl, method, options): """ @@ -59,8 +69,10 @@ return d if method == 'pyhtml2text': - import html2text - parser = html2text.HTML2Text() + if pyhtml2text is None: + raise ImportError('Please install pyhtml2text') + + parser = pyhtml2text.HTML2Text() parser.baseurl = baseurl for k, v in options.items(): setattr(parser, k.lower(), v) @@ -68,7 +80,8 @@ return d if method == 'bs4': - from bs4 import BeautifulSoup + if BeautifulSoup is None: + raise ImportError('Please install BeautifulSoup') parser = options.pop('parser', 'lxml') soup = BeautifulSoup(data, parser) d = soup.get_text(strip=True) diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/urlwatch-2.21/lib/urlwatch/ical2txt.py new/urlwatch-2.22/lib/urlwatch/ical2txt.py --- old/urlwatch-2.21/lib/urlwatch/ical2txt.py 2020-07-31 07:37:07.000000000 +0200 +++ new/urlwatch-2.22/lib/urlwatch/ical2txt.py 2020-12-19 12:27:43.000000000 +0100 @@ -28,8 +28,16 @@ # THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -def ical2text(ical_string): +try: import vobject +except ImportError: + vobject = None + + +def ical2text(ical_string): + if vobject is None: + raise ImportError('Please install vobject') + result = [] if isinstance(ical_string, str): parsedCal = vobject.readOne(ical_string) diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/urlwatch-2.21/lib/urlwatch/jobs.py new/urlwatch-2.22/lib/urlwatch/jobs.py --- old/urlwatch-2.21/lib/urlwatch/jobs.py 2020-07-31 07:37:07.000000000 +0200 +++ new/urlwatch-2.22/lib/urlwatch/jobs.py 2020-12-19 12:27:43.000000000 +0100 @@ -180,7 +180,7 @@ class Job(JobBase): __required__ = () - __optional__ = ('name', 'filter', 'max_tries', 'diff_tool', 'compared_versions', 'diff_filter') + __optional__ = ('name', 'filter', 'max_tries', 'diff_tool', 'compared_versions', 'diff_filter', 'treat_new_as_changed') # determine if hyperlink "a" tag is used in HtmlReporter LOCATION_IS_URL = False @@ -221,13 +221,13 @@ __required__ = ('url',) __optional__ = ('cookies', 'data', 'method', 'ssl_no_verify', 'ignore_cached', 'http_proxy', 'https_proxy', 'headers', 'ignore_connection_errors', 'ignore_http_error_codes', 'encoding', 'timeout', - 'ignore_timeout_errors', 'ignore_too_many_redirects') + 'ignore_timeout_errors', 'ignore_too_many_redirects', 'user_visible_url') LOCATION_IS_URL = True CHARSET_RE = re.compile('text/(html|plain); charset=([^;]*)') def get_location(self): - return self.url + return self.user_visible_url or self.url def retrieve(self, job_state): headers = { @@ -364,6 +364,8 @@ __required__ = ('navigate',) + __optional__ = ('wait_until',) + LOCATION_IS_URL = True def get_location(self): @@ -377,4 +379,4 @@ self.ctx.close() def retrieve(self, job_state): - return self.ctx.process(self.navigate) + return self.ctx.process(self.navigate, wait_until=self.wait_until) diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/urlwatch-2.21/lib/urlwatch/main.py new/urlwatch-2.22/lib/urlwatch/main.py --- old/urlwatch-2.21/lib/urlwatch/main.py 2020-07-31 07:37:07.000000000 +0200 +++ new/urlwatch-2.22/lib/urlwatch/main.py 2020-12-19 12:27:43.000000000 +0100 @@ -69,8 +69,6 @@ self.urlwatch_config.migrate_cache(self) def check_directories(self): - if not os.path.isdir(self.urlwatch_config.urlwatch_dir): - os.makedirs(self.urlwatch_config.urlwatch_dir) if not os.path.exists(self.urlwatch_config.config): self.config_storage.write_default_config(self.urlwatch_config.config) print(""" diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/urlwatch-2.21/lib/urlwatch/reporters.py new/urlwatch-2.22/lib/urlwatch/reporters.py --- old/urlwatch-2.21/lib/urlwatch/reporters.py 2020-07-31 07:37:07.000000000 +0200 +++ new/urlwatch-2.22/lib/urlwatch/reporters.py 2020-12-19 12:27:43.000000000 +0100 @@ -67,6 +67,11 @@ except ImportError: Markdown = None +try: + from colorama import AnsiToWin32 +except ImportError: + AnsiToWin32 = None + logger = logging.getLogger(__name__) # Regular expressions that match the added/removed markers of GNU wdiff output @@ -203,7 +208,11 @@ elif line.startswith('-'): yield SafeHtml('<span class="unified_sub">{line}</span>').format(line=line) else: - yield SafeHtml('<span class="unified_nor">{line}</span>').format(line=line) + # Basic colorization for wdiff-style differences + line = SafeHtml('<span class="unified_nor">{line}</span>').format(line=line) + line = re.sub(WDIFF_ADDED_RE, lambda x: '<span class="diff_add">' + x.group(0) + '</span>', line) + line = re.sub(WDIFF_REMOVED_RE, lambda x: '<span class="diff_sub">' + x.group(0) + '</span>', line) + yield line def _format_content(self, job_state, difftype): if job_state.verb == 'error': @@ -336,8 +345,7 @@ return self._incolor(4, s) def _get_print(self): - if sys.platform == 'win32' and self._has_color: - from colorama import AnsiToWin32 + if sys.platform == 'win32' and self._has_color and AnsiToWin32 is not None: return functools.partial(print, file=AnsiToWin32(sys.stdout).stream) return print @@ -610,44 +618,113 @@ class SlackReporter(TextReporter): """Send a message to a Slack channel""" - MAX_LENGTH = 40000 __kind__ = 'slack' + def __init__(self, *args, **kwargs): + super().__init__(*args, **kwargs) + self.max_length = self.config.get('max_message_length', 40000) + def submit(self): webhook_url = self.config['webhook_url'] text = '\n'.join(super().submit()) if not text: - logger.debug('Not calling slack API (no changes)') + logger.debug('Not calling {} API (no changes)'.format(self.__kind__)) return result = None - for chunk in chunkstring(text, self.MAX_LENGTH, numbering=True): - res = self.submit_to_slack(webhook_url, chunk) + for chunk in chunkstring(text, self.max_length, numbering=True): + res = self.submit_chunk(webhook_url, chunk) if res.status_code != requests.codes.ok or res is None: result = res return result - def submit_to_slack(self, webhook_url, text): - logger.debug("Sending slack request with text:{0}".format(text)) + def submit_chunk(self, webhook_url, text): + logger.debug("Sending {} request with text: {}".format(self.__kind__, text)) post_data = {"text": text} result = requests.post(webhook_url, json=post_data) try: if result.status_code == requests.codes.ok: - logger.info("Slack response: ok") + logger.info("{} response: ok".format(self.__kind__)) else: - logger.error("Slack error: {0}".format(result.text)) + logger.error("{} error: {}".format(self.__kind__, result.text)) except ValueError: logger.error( - "Failed to parse slack response. HTTP status code: {0}, content: {1}".format(result.status_code, - result.content)) + "Failed to parse {} response. HTTP status code: {}, content: {}".format(self.__kind__, + result.status_code, + result.content)) return result -class MarkdownReporter(ReporterBase): +class MattermostReporter(SlackReporter): + """Send a message to a Mattermost channel""" + + __kind__ = 'mattermost' + + +class DiscordReporter(TextReporter): + """Send a message to a Discord channel""" + + __kind__ = 'discord' + + def __init__(self, *args, **kwargs): + super().__init__(*args, **kwargs) + self.max_length = self.config.get('max_message_length', 2000) + def submit(self): + webhook_url = self.config['webhook_url'] + text = '\n'.join(super().submit()) + + if not text: + logger.debug('Not calling Discord API (no changes)') + return + + result = None + for chunk in chunkstring(text, self.max_length, numbering=True): + res = self.submit_to_discord(webhook_url, chunk) + if res.status_code != requests.codes.ok or res is None: + result = res + + return result + + def submit_to_discord(self, webhook_url, text): + if self.config.get('embed', False): + filtered_job_states = list(self.report.get_filtered_job_states(self.job_states)) + + subject_args = { + 'count': len(filtered_job_states), + 'jobs': ', '.join(job_state.job.pretty_name() for job_state in filtered_job_states), + } + + subject = self.config['subject'].format(**subject_args) + + post_data = { + 'content': subject, + 'embeds': [{ + 'type': 'rich', + 'description': text, + }] + } + else: + post_data = {"content": text} + + logger.debug("Sending Discord request with post_data: {0}".format(post_data)) + + result = requests.post(webhook_url, json=post_data) + try: + if result.status_code in (requests.codes.ok, requests.codes.no_content): + logger.info("Discord response: ok") + else: + logger.error("Discord error: {0}".format(result.text)) + except ValueError: + logger.error("Failed to parse Discord response. HTTP status code: {0}, content: {1}".format(result.status_code, result.content)) + return result + + +class MarkdownReporter(ReporterBase): + def submit(self, max_length=None): cfg = self.report.config['report']['markdown'] show_details = cfg['details'] show_footer = cfg['footer'] @@ -668,18 +745,144 @@ summary.extend(summary_part) details.extend(details_part) + if summary and show_footer: + footer = ('--- ', + '%s %s, %s ' % (urlwatch.pkgname, urlwatch.__version__, urlwatch.__copyright__), + 'Website: %s ' % (urlwatch.__url__,), + 'watched %d URLs in %d seconds' % (len(self.job_states), self.duration.seconds)) + else: + footer = None + + if not show_details: + details = None + + trimmed_msg = "*Parts of the report were omitted due to message length.*\n" + max_length -= len(trimmed_msg) + + trimmed, summary, details, footer = MarkdownReporter._render( + max_length, summary, details, footer + ) + if summary: - yield from ('%d. %s' % (idx + 1, line) for idx, line in enumerate(summary)) + yield from summary yield '' if show_details: - yield from details + for header, body in details: + yield header + yield body + yield '' + + if trimmed: + yield trimmed_msg if summary and show_footer: - yield from ('--- ', - '%s %s, %s ' % (urlwatch.pkgname, urlwatch.__version__, urlwatch.__copyright__), - 'Website: %s ' % (urlwatch.__url__,), - 'watched %d URLs in %d seconds' % (len(self.job_states), self.duration.seconds)) + yield from footer + + @classmethod + def _render(cls, max_length, summary=None, details=None, footer=None): + """Render the report components, trimming them if the available length is insufficient. + + Returns a tuple (trimmed, summary, details, footer). + + The first element of the tuple indicates whether any part of the report + was omitted due to message length. The other elements are the + potentially trimmed report components. + """ + + # The footer/summary lengths are the sum of the length of their parts + # plus the space taken up by newlines. + if summary: + summary = ['%d. %s' % (idx + 1, line) for idx, line in enumerate(summary)] + summary_len = sum(len(part) for part in summary) + len(summary) - 1 + else: + summary_len = 0 + + if footer: + footer_len = sum(len(part) for part in footer) + len(footer) - 1 + else: + footer_len = 0 + + if max_length is None: + return (False, summary, details, footer) + else: + if summary_len > max_length: + return (True, [], [], "") + elif footer_len > max_length - summary_len: + return (True, summary, [], footer[:max_length - summary_len]) + elif not details: + return (False, summary, [], footer) + else: + # Determine the space remaining after taking into account + # summary and footer. + remaining_len = max_length - summary_len - footer_len + headers_len = sum(len(header) for header, _ in details) + + details_trimmed = False + + # First ensure we can show all the headers. + if headers_len > remaining_len: + return (True, summary, [], footer) + else: + remaining_len -= headers_len + + # Calculate approximate available length per item, shared + # equally between all details components. + body_len_per_details = remaining_len // len(details) + + trimmed_details = [] + unprocessed = len(details) + + for header, body in details: + # Calculate the available length for the body and render it + avail_length = body_len_per_details - 1 + + body_trimmed, body = cls._format_details_body(body, avail_length) + + if body_trimmed: + details_trimmed = True + + if len(body) <= body_len_per_details: + trimmed_details.append((header, body)) + else: + trimmed_details.append((header, "")) + + # If the current item's body did not use all of its + # allocated space, distribute the unused space into + # subsequent items, unless we're at the last item + # already. + unused = body_len_per_details - len(body) + remaining_len -= body_len_per_details + remaining_len += unused + unprocessed -= 1 + + if unprocessed > 0: + body_len_per_details = remaining_len // unprocessed + + return (details_trimmed, summary, trimmed_details, footer) + + @staticmethod + def _format_details_body(s, max_length): + wrapper_length = len("```diff\n\n```") + + # Message to print when the diff is too long. + trim_message = "*diff trimmed*" + trim_message_length = len(trim_message) + + if max_length is None or len(s) + wrapper_length <= max_length: + return False, "```diff\n{}\n```".format(s) + else: + target_max_length = max_length - trim_message_length - wrapper_length + pos = s.rfind("\n", 0, target_max_length) + + if pos == -1: + # Just a single long line, so cut it short. + s = s[0:target_max_length] + else: + # Multiple lines, cut off extra lines. + s = s[0:pos] + + return True, "{}\n```diff\n{}\n```".format(trim_message, s) def _format_content(self, job_state): if job_state.verb == 'error': @@ -708,17 +911,15 @@ summary_part.append(pretty_summary) - details_part.append('### ' + summary) if content is not None: - details_part.extend(('', '```', content, '```', '')) - details_part.extend(('', '')) + details_part.append(('### ' + summary, content)) return summary_part, details_part class MatrixReporter(MarkdownReporter): """Send a message to a room using the Matrix protocol""" - MAX_LENGTH = 4096 + MAX_LENGTH = 16384 __kind__ = 'matrix' @@ -730,19 +931,16 @@ access_token = self.config['access_token'] room_id = self.config['room_id'] - body_markdown = '\n'.join(super().submit()) + body_markdown = '\n'.join(super().submit(MatrixReporter.MAX_LENGTH)) if not body_markdown: logger.debug('Not calling Matrix API (no changes)') return - if len(body_markdown) > self.MAX_LENGTH: - body_markdown = body_markdown[:self.MAX_LENGTH] - client_api = matrix_client.api.MatrixHttpApi(homeserver_url, access_token) if Markdown is not None: - body_html = Markdown().convert(body_markdown) + body_html = Markdown(extras=["fenced-code-blocks", "highlightjs-lang"]).convert(body_markdown) client_api.send_message_event( room_id, diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/urlwatch-2.21/lib/urlwatch/storage.py new/urlwatch-2.22/lib/urlwatch/storage.py --- old/urlwatch-2.21/lib/urlwatch/storage.py 2020-07-31 07:37:07.000000000 +0200 +++ new/urlwatch-2.22/lib/urlwatch/storage.py 2020-12-19 12:27:43.000000000 +0100 @@ -32,6 +32,7 @@ import stat import copy import platform +import collections from abc import ABCMeta, abstractmethod import shutil @@ -49,6 +50,11 @@ except ImportError: redis = None +try: + import pwd +except ImportError: + pwd = None + from .util import atomic_rename, edit_file from .jobs import JobBase, UrlJob, ShellJob from .filters import FilterBase @@ -124,6 +130,19 @@ 'slack': { 'enabled': False, 'webhook_url': '', + 'max_message_length': 40000, + }, + 'mattermost': { + 'enabled': False, + 'webhook_url': '', + 'max_message_length': 40000, + }, + 'discord': { + 'enabled': False, + 'embed': False, + 'subject': '{count} changes: {jobs}', + 'webhook_url': '', + 'max_message_length': 2000, }, 'matrix': { 'enabled': False, @@ -182,7 +201,6 @@ # If there is no controlling terminal, because urlwatch is launched by # cron, or by a systemd.service for example, os.getlogin() fails with: # OSError: [Errno 25] Inappropriate ioctl for device - import pwd return pwd.getpwuid(os.getuid()).pw_name @@ -219,6 +237,7 @@ if os.path.exists(self.filename): shutil.copy(self.filename, file_edit) elif example_file is not None and os.path.exists(example_file): + os.makedirs(os.path.dirname(file_edit) or '.', exist_ok=True) shutil.copy(example_file, file_edit) while True: @@ -249,6 +268,7 @@ @classmethod def write_default_config(cls, filename): + os.makedirs(os.path.dirname(filename) or '.', exist_ok=True) config_storage = cls(None) config_storage.filename = filename config_storage.save() @@ -349,14 +369,31 @@ class UrlsYaml(BaseYamlFileStorage, UrlsBaseFileStorage): + @classmethod + def _parse(cls, fp): + jobs = [JobBase.unserialize(job) for job in yaml.load_all(fp, Loader=yaml.SafeLoader) + if job is not None] + jobs_by_guid = collections.defaultdict(list) + for job in jobs: + jobs_by_guid[job.get_guid()].append(job) + + conflicting_jobs = [] + for guid, guid_jobs in jobs_by_guid.items(): + if len(guid_jobs) != 1: + conflicting_jobs.append(guid_jobs[0].get_location()) + + if conflicting_jobs: + raise ValueError('\n '.join(['Each job must have a unique URL, append #1, #2, ... to make them unique:'] + + conflicting_jobs)) + + return jobs @classmethod def parse(cls, *args): filename = args[0] if filename is not None and os.path.exists(filename): with open(filename) as fp: - return [JobBase.unserialize(job) for job in yaml.load_all(fp, Loader=yaml.SafeLoader) - if job is not None] + return cls._parse(fp) def save(self, *args): jobs = args[0] @@ -367,7 +404,7 @@ def load(self, *args): with open(self.filename) as fp: - return [JobBase.unserialize(job) for job in yaml.load_all(fp, Loader=yaml.SafeLoader) if job is not None] + return self._parse(fp) class UrlsTxt(BaseTxtFileStorage, UrlsBaseFileStorage): diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/urlwatch-2.21/lib/urlwatch/tests/test_handler.py new/urlwatch-2.22/lib/urlwatch/tests/test_handler.py --- old/urlwatch-2.21/lib/urlwatch/tests/test_handler.py 2020-07-31 07:37:07.000000000 +0200 +++ new/urlwatch-2.22/lib/urlwatch/tests/test_handler.py 2020-12-19 12:27:43.000000000 +0100 @@ -1,7 +1,6 @@ import sys from glob import glob -import pycodestyle as pycodestyle from urlwatch.jobs import UrlJob, JobBase, ShellJob from urlwatch.storage import UrlsYaml, UrlsTxt @@ -80,6 +79,7 @@ def test_pep8_conformance(): """Test that we conform to PEP-8.""" + import pycodestyle style = pycodestyle.StyleGuide(ignore=['E501', 'E402', 'W503']) py_files = [y for x in os.walk(os.path.abspath('.')) for y in glob(os.path.join(x[0], '*.py'))] diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/urlwatch-2.21/setup.py new/urlwatch-2.22/setup.py --- old/urlwatch-2.21/setup.py 2020-07-31 07:37:07.000000000 +0200 +++ new/urlwatch-2.22/setup.py 2020-12-19 12:27:43.000000000 +0100 @@ -10,8 +10,8 @@ m = dict(re.findall("\n__([a-z]+)__ = '([^']+)'", main_py)) docs = re.findall('"""(.*?)"""', main_py, re.DOTALL) -if sys.version_info < (3, 3): - sys.exit('urlwatch requires Python 3.3 or newer') +if sys.version_info < (3, 6): + sys.exit('urlwatch requires Python 3.6 or newer') m['name'] = 'urlwatch' m['author'], m['author_email'] = re.match(r'(.*) <(.*)>', m['author']).groups() @@ -22,7 +22,7 @@ m['entry_points'] = {"console_scripts": ["urlwatch=urlwatch.cli:main"]} m['package_dir'] = {'': 'lib'} m['packages'] = ['urlwatch'] -m['python_requires'] = '>=3.5' +m['python_requires'] = '>=3.6' m['data_files'] = [ ('share/man/man1', ['share/man/man1/urlwatch.1']), ('share/urlwatch/examples', [ diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/urlwatch-2.21/share/urlwatch/examples/hooks.py.example new/urlwatch-2.22/share/urlwatch/examples/hooks.py.example --- old/urlwatch-2.21/share/urlwatch/examples/hooks.py.example 2020-07-31 07:37:07.000000000 +0200 +++ new/urlwatch-2.22/share/urlwatch/examples/hooks.py.example 2020-12-19 12:27:43.000000000 +0100 @@ -49,7 +49,7 @@ # # __kind__ = 'case' # -# def filter(self, data, subfilter=None): +# def filter(self, data, subfilter): # # The subfilter is specified using a colon, for example the "case" # # filter here can be specified as "case:upper" and "case:lower" # @@ -69,7 +69,7 @@ # # __kind__ = 'indent' # -# def filter(self, data, subfilter=None): +# def filter(self, data, subfilter): # # The subfilter here is a number of characters to indent # # if subfilter is None: @@ -87,7 +87,7 @@ MATCH = {'url': 'http://example.org/'} # An auto-match filter does not have any subfilters - def filter(self, data): + def filter(self, data, subfilter): return data.replace('foo', 'bar') class CustomRegexMatchUrlFilter(filters.RegexMatchFilter): @@ -95,7 +95,7 @@ MATCH = {'url': re.compile('http://example.org/.*')} # An auto-match filter does not have any subfilters - def filter(self, data): + def filter(self, data, subfilter): return data.replace('foo', 'bar') diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/urlwatch-2.21/share/urlwatch/examples/urls.yaml.example new/urlwatch-2.22/share/urlwatch/examples/urls.yaml.example --- old/urlwatch-2.21/share/urlwatch/examples/urls.yaml.example 2020-07-31 07:37:07.000000000 +0200 +++ new/urlwatch-2.22/share/urlwatch/examples/urls.yaml.example 2020-12-19 12:27:43.000000000 +0100 @@ -38,7 +38,9 @@ --- # You can do POST requests by providing data parameter. # POST data can be a URL-encoded string (see last example) or a dict. -url: "http://example.com/search.cgi" +# If you are using the URL multiple times, you need to append "#something" for +# each different job, so that the URL string still uniquely identifies the job +url: "http://example.com/search.cgi#alternative" data: button: Search q: something
