commit urlwatch for openSUSE:Factory

Source-Sync Sat, 30 Jan 2021 04:58:17 -0800

Script 'mail_helper' called by obssrc
Hello community,

here is the log from the commit of package urlwatch for openSUSE:Factory 
checked in at 2021-01-30 13:57:11
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Comparing /work/SRC/openSUSE:Factory/urlwatch (Old)
 and      /work/SRC/openSUSE:Factory/.urlwatch.new.28504 (New)
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Package is "urlwatch"

Sat Jan 30 13:57:11 2021 rev:20 rq:867912 version:2.22

Changes:
--------
--- /work/SRC/openSUSE:Factory/urlwatch/urlwatch.changes        2020-08-20 
22:33:17.076106363 +0200
+++ /work/SRC/openSUSE:Factory/.urlwatch.new.28504/urlwatch.changes     
2021-01-30 13:58:04.326427885 +0100
@@ -1,0 +2,52 @@
+Mon Jan  4 10:59:57 UTC 2021 - Michael Vetter <[email protected]>
+
+- Update to 2.22:
+  Added:
+  * Added 'wait_until' option to browser jobs to configure how long
+    the headless browser will wait for pages to load.
+  * Jobs now have an optional treat_new_as_changed (default false)
+    key that can be set, and will treat newly-found pages as changed,
+       and display a diff from the empty string (useful for diff_tool
+       or diff_filter with side effects)
+  * New reporters: discord, mattermost
+  * New key user_visible_url for URL jobs that can be used to show a
+    different URL in reports (useful if the watched URL is a REST API
+       endpoint, but the report should link to the corresponding web page)
+  * The Markdown reporter now supports limiting the report length via
+    the max_length parameter of the submit method. The length limiting
+       logic is smart in the sense that it will try trimming the details first,
+       followed by omitting them completely, followed by omitting the summary.
+       If a part of the report is omitted, a note about this is added to the
+       report. (PR#572, by Denis Kasak)
+  Changed:
+  * Diff output is now generated more uniformly, independent of whether
+    the input data has a trailing newline or not; if this behavior is not
+       intended, use an external diff_tool (PR#550, by Adam Goldsmith)
+  * The --test-diff-filter output now properly reports timestamps from the
+    history entry instead of the current date and time (Fixes #573)
+  * Unique GUIDs for jobs are now enforced at load time, append "#1",
+    "#2", ... to the URLs to make them unique if you have multiple different
+       jobs that share the same request URL (Fixes #586)
+  * When a config, urls file or hooks file does not exist and should be
+    edited or inited, its parent folders will be created (previously only
+       the urlwatch configuration folder was created; Fixes #594)
+  * Auto-matched filters now always get None supplied as subfilter; any
+    custom filters must accept a subfilter parameter after the existing
+       data parameter
+  * Drop support for Python 3.5
+  Fixed:
+  * Make imports thread-safe: This might increase startup times a bit,
+    as dependencies are imported on bootup instead of when first used.
+       Importing in Python is not (yet) thread-safe, so we cannot import
+       new modules from the worker threads reliably (Fixes #559, #601)
+  * The Matrix reporter was improved in several ways (PR#572, by Denis Kasak):
+    - The maximum length of the report was increase from 4096 to 16384.
+    - The report length limiting is now implemented via the new length
+         limiting functionality of the Markdown reporter. Previously, the
+         report was simply trimmed at the end which could break the diff
+         blocks and make them render incorrectly.
+    - The diff code blocks are now tagged as diffs which will allow the
+         diffs to be syntax highlighted as such. This doesn't yet work in
+         Element, pending on the resolution of trentm/python-markdown2#370.
+
+-------------------------------------------------------------------

Old:
----
  urlwatch-2.21.tar.gz

New:
----
  urlwatch-2.22.tar.gz

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Other differences:
------------------
++++++ urlwatch.spec ++++++
--- /var/tmp/diff_new_pack.cScTJr/_old  2021-01-30 13:58:04.942428924 +0100
+++ /var/tmp/diff_new_pack.cScTJr/_new  2021-01-30 13:58:04.946428930 +0100
@@ -1,7 +1,7 @@
 #
 # spec file for package urlwatch
 #
-# Copyright (c) 2020 SUSE LLC
+# Copyright (c) 2021 SUSE LLC
 #
 # All modifications and additions to the file contributed by third parties
 # remain the property of their copyright owners, unless otherwise agreed
@@ -17,14 +17,14 @@
 
 
 Name:           urlwatch
-Version:        2.21
+Version:        2.22
 Release:        0
 Summary:        A tool for monitoring webpages for updates
 License:        BSD-3-Clause
 Group:          Productivity/Networking/Web/Utilities
 URL:            https://thp.io/2008/urlwatch/
 Source0:        
https://github.com/thp/%{name}/archive/%{version}.tar.gz#/%{name}-%{version}.tar.gz
-BuildRequires:  python3-devel >= 3.5
+BuildRequires:  python3-devel >= 3.6
 BuildRequires:  python3-setuptools
 Requires:       python3-PyYAML
 Requires:       python3-appdirs

++++++ urlwatch-2.21.tar.gz -> urlwatch-2.22.tar.gz ++++++
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/urlwatch-2.21/.gitignore new/urlwatch-2.22/.gitignore
--- old/urlwatch-2.21/.gitignore        2020-07-31 07:37:07.000000000 +0200
+++ new/urlwatch-2.22/.gitignore        2020-12-19 12:27:43.000000000 +0100
@@ -1,3 +1,4 @@
 __pycache__
 .idea
-build
\ No newline at end of file
+build
+*.egg-info/
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/urlwatch-2.21/.travis.yml 
new/urlwatch-2.22/.travis.yml
--- old/urlwatch-2.21/.travis.yml       2020-07-31 07:37:07.000000000 +0200
+++ new/urlwatch-2.22/.travis.yml       2020-12-19 12:27:43.000000000 +0100
@@ -1,7 +1,6 @@
 language: python
 cache: pip
 python:
-  - "3.5"
   - "3.6"
   - "3.7"
   - "3.8"
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/urlwatch-2.21/CHANGELOG.md 
new/urlwatch-2.22/CHANGELOG.md
--- old/urlwatch-2.21/CHANGELOG.md      2020-07-31 07:37:07.000000000 +0200
+++ new/urlwatch-2.22/CHANGELOG.md      2020-12-19 12:27:43.000000000 +0100
@@ -4,6 +4,63 @@
 
 The format mostly follows [Keep a 
Changelog](http://keepachangelog.com/en/1.0.0/).
 
+## [2.22] -- 2020-12-19
+
+### Added
+
+- Added 'wait_until' option to browser jobs to configure how long
+  the headless browser will wait for pages to load.
+- Jobs now have an optional `treat_new_as_changed` (default `false`)
+  key that can be set, and will treat newly-found pages as changed,
+  and display a diff from the empty string (useful for `diff_tool`
+  or `diff_filter` with side effects)
+- New reporters: `discord`, `mattermost`
+- New key `user_visible_url` for URL jobs that can be used to show
+  a different URL in reports (useful if the watched URL is a REST API
+  endpoint, but the report should link to the corresponding web page)
+- The Markdown reporter now supports limiting the report length via the
+  `max_length` parameter of the `submit` method. The length limiting logic is
+  smart in the sense that it will try trimming the details first, followed by
+  omitting them completely, followed by omitting the summary. If a part of the
+  report is omitted, a note about this is added to the report. (PR#572, by
+  Denis Kasak)
+
+### Changed
+
+- Diff output is now generated more uniformly, independent of whether
+  the input data has a trailing newline or not; if this behavior is not
+  intended, use an external `diff_tool` (PR#550, by Adam Goldsmith)
+- The `--test-diff-filter` output now properly reports timestamps from
+  the history entry instead of the current date and time (Fixes #573)
+- Unique GUIDs for jobs are now enforced at load time, append "#1",
+  "#2", ... to the URLs to make them unique if you have multiple
+  different jobs that share the same request URL (Fixes #586)
+- When a config, urls file or hooks file does not exist and should be
+  edited or inited, its parent folders will be created (previously
+  only the urlwatch configuration folder was created; Fixes #594)
+- Auto-matched filters now always get `None` supplied as subfilter;
+  any custom filters must accept a `subfilter` parameter after the
+  existing `data` parameter
+- Drop support for Python 3.5
+
+## Fixed
+
+- Make imports thread-safe: This might increase startup times a bit,
+  as dependencies are imported on bootup instead of when first used.
+  Importing in Python is not (yet) thread-safe, so we cannot import
+  new modules from the worker threads reliably (Fixes #559, #601)
+
+- The Matrix reporter was improved in several ways (PR#572, by Denis Kasak):
+
+  - The maximum length of the report was increase from 4096 to 16384.
+  - The report length limiting is now implemented via the new length limiting
+    functionality of the Markdown reporter. Previously, the report was simply
+    trimmed at the end which could break the diff blocks and make them render
+    incorrectly.
+  - The diff code blocks are now tagged as diffs which will allow the diffs to
+    be syntax highlighted as such. This doesn't yet work in Element, pending on
+    the resolution of trentm/python-markdown2#370.
+
 ## [2.21] -- 2020-07-31
 
 ### Added
@@ -191,7 +248,7 @@
 
 ### Added
 - Support for Mailgun regions (by Daniel Peukert, PR#280)
-- CLI: Allow multiple occurences of 'filter' when adding jobs (PR#278)
+- CLI: Allow multiple occurrences of 'filter' when adding jobs (PR#278)
 
 ### Changed
 - Fixed incorrect name for chat_id config in the default config (by Robin B, 
PR#276)
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/urlwatch-2.21/Dockerfile new/urlwatch-2.22/Dockerfile
--- old/urlwatch-2.21/Dockerfile        2020-07-31 07:37:07.000000000 +0200
+++ new/urlwatch-2.22/Dockerfile        2020-12-19 12:27:43.000000000 +0100
@@ -1,6 +1,6 @@
 FROM python:3.8.2
 
-RUN python3 -m pip install pyyaml minidb requests keyring appdirs lxml 
cssselect beautifulsoup4 jsbeautifier cssbeautifier aioxmpp
+RUN python3 -m pip install --no-cache-dir pyyaml minidb requests keyring 
appdirs lxml cssselect beautifulsoup4 jsbeautifier cssbeautifier aioxmpp
 
 WORKDIR /opt/urlwatch
 
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/urlwatch-2.21/docs/source/advanced.rst 
new/urlwatch-2.22/docs/source/advanced.rst
--- old/urlwatch-2.21/docs/source/advanced.rst  2020-07-31 07:37:07.000000000 
+0200
+++ new/urlwatch-2.22/docs/source/advanced.rst  2020-12-19 12:27:43.000000000 
+0100
@@ -258,3 +258,57 @@
 select a region of a web page. It then generates a configuration
 for ``urlwatch`` to run ``pyvisualcompare`` and generate a hash
 for the screen contents.
+
+
+Configuring how long browser jobs wait for pages to load
+--------------------------------------------------------
+
+For browser jobs, you can configure how long the headless browser will wait
+before a page is considered loaded by using the `wait_until` option. It can 
take one of four values:
+
+  - `load` will wait until the `load` browser event is fired (default).
+  - `documentloaded` will wait until the `DOMContentLoaded` browser event is 
fired.
+  - `networkidle0` will wait until there are no more than 0 network 
connections for at least 500 ms.
+  - `networkidle2` will wait until there are no more than 2 network 
connections for at least 500 ms.
+
+
+Treating ``NEW`` jobs as ``CHANGED``
+------------------------------------
+
+In some cases (e.g. when the ``diff_tool`` or ``diff_filter`` executes some
+external command as a side effect that should also run for the initial page
+state), you can set the ``treat_new_as_changed`` to ``true``, which will make
+the job report as ``CHANGED`` instead of ``NEW`` the first time it is retrieved
+(and the diff will be reported, too).
+
+.. code-block:: yaml
+
+   url: http://example.com/initialpage.html
+   treat_new_as_changed: true
+
+This option will also change the behavior of ``--test-diff-filter``, and allow
+testing the diff filter if only a single version of the page has been
+retrieved.
+
+
+Monitoring the same URL in multiple jobs
+----------------------------------------
+
+Because urlwatch uses the ``url``/``navigate`` (for URL/Browser jobs) and/or
+the ``command`` (for Shell jobs) key as unique identifier, each URL can only
+appear in a single job. If you want to monitor the same URL multiple times,
+you can append ``#1``, ``#2``, ... (or anything that makes them unique) to
+the URLs, like this:
+
+.. code-block:: yaml
+
+    name: "Looking for Thing A"
+    url: http://example.com/#1
+    filter:
+      - grep: "Thing A"
+    ---
+    name: "Looking for Thing B"
+    url: http://example.com/#2
+    filter:
+      - grep: "Thing B"
+
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/urlwatch-2.21/docs/source/conf.py 
new/urlwatch-2.22/docs/source/conf.py
--- old/urlwatch-2.21/docs/source/conf.py       2020-07-31 07:37:07.000000000 
+0200
+++ new/urlwatch-2.22/docs/source/conf.py       2020-12-19 12:27:43.000000000 
+0100
@@ -22,7 +22,7 @@
 author = 'Thomas Perl'
 
 # The full version, including alpha/beta/rc tags
-release = '2.21'
+release = '2.22'
 
 
 # -- General configuration ---------------------------------------------------
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/urlwatch-2.21/docs/source/dependencies.rst 
new/urlwatch-2.22/docs/source/dependencies.rst
--- old/urlwatch-2.21/docs/source/dependencies.rst      2020-07-31 
07:37:07.000000000 +0200
+++ new/urlwatch-2.22/docs/source/dependencies.rst      2020-12-19 
12:27:43.000000000 +0100
@@ -10,7 +10,7 @@
 Mandatory Packages
 ------------------
 
--  Python 3.5 or newer
+-  Python 3.6 or newer
 -  `PyYAML <http://pyyaml.org/>`__
 -  `minidb <https://thp.io/2010/minidb/>`__
 -  `requests <http://python-requests.org/>`__
@@ -52,8 +52,8 @@
 
+-------------------------+---------------------------------------------------------------------+
 | Unit testing            | `pycodestyle 
<http://pycodestyle.pycqa.org/en/latest/>`__,          |
 |                         | `docutils <https://docutils.sourceforge.io>`__,    
                 |
-|                         | `Pygments <https://pygments.org>`__ and            
                 |
-|                         | dependencies for other features as needed          
                 |
++-------------------------+---------------------------------------------------------------------+
+| Documentation build     | `Sphinx <https://www.sphinx-doc.org/>`__           
                 |
 
+-------------------------+---------------------------------------------------------------------+
 | `beautify` filter       | `beautifulsoup4 
<https://pypi.org/project/beautifulsoup4/>`__;      |
 |                         | optional dependencies (for ``<script>`` and 
``<style>`` tags):      |
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/urlwatch-2.21/docs/source/deprecated.rst 
new/urlwatch-2.22/docs/source/deprecated.rst
--- old/urlwatch-2.21/docs/source/deprecated.rst        2020-07-31 
07:37:07.000000000 +0200
+++ new/urlwatch-2.22/docs/source/deprecated.rst        2020-12-19 
12:27:43.000000000 +0100
@@ -4,6 +4,40 @@
 This page lists the features that are deprecated and steps to
 update your configuration to use the replacements (if any).
 
+
+Filters without subfilters (UNRELEASED)
+---------------------------------------
+
+In older urlwatch versions, it was possible to write custom
+filters that do not take a ``subfilter`` as argument.
+
+If you have written your own filter code like this:
+
+.. code:: python
+
+   class CustomFilter(filters.FilterBase):
+       """My old custom filter"""
+
+       __kind__ = 'foo'
+
+       def filter(self, data):
+           ...
+
+You have to update your filter to take an optional subfilter
+argument (if the filter configuration does not have a subfilter
+defined, the value of ``subfilter`` will be ``None``):
+
+.. code:: python
+
+   class CustomFilter(filters.FilterBase):
+       """My new custom filter"""
+
+       __kind__ = 'foo'
+
+       def filter(self, data, subfilter):
+           ...
+
+
 string-based filter definitions (since 2.19)
 --------------------------------------------
 
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/urlwatch-2.21/docs/source/filters.rst 
new/urlwatch-2.22/docs/source/filters.rst
--- old/urlwatch-2.21/docs/source/filters.rst   2020-07-31 07:37:07.000000000 
+0200
+++ new/urlwatch-2.22/docs/source/filters.rst   2020-12-19 12:27:43.000000000 
+0100
@@ -195,7 +195,7 @@
 
 To match an element in an `XML
 namespace <https://www.w3.org/TR/xml-names/>`__, use a namespace prefix
-before the tag name. Use a ``:`` to seperate the namespace prefix and
+before the tag name. Use a ``:`` to separate the namespace prefix and
 the tag name in an XPath expression, and use a ``|`` in a CSS selector.
 
 .. code:: yaml
@@ -490,7 +490,7 @@
 
 Within the ``shellpipe`` script, two environment variables will
 be set for further customization (this can be useful if you have
-a external shell script file that is used as filter for multiple
+an external shell script file that is used as filter for multiple
 jobs, but needs to treat each job in a slightly different way):
 
 
+----------------------------+------------------------------------------------------+
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/urlwatch-2.21/docs/source/introduction.rst 
new/urlwatch-2.22/docs/source/introduction.rst
--- old/urlwatch-2.21/docs/source/introduction.rst      2020-07-31 
07:37:07.000000000 +0200
+++ new/urlwatch-2.22/docs/source/introduction.rst      2020-12-19 
12:27:43.000000000 +0100
@@ -13,7 +13,7 @@
 
 :ref:`Jobs`
 -----------
-Each website or shell command to be monitored consitutes a "job".
+Each website or shell command to be monitored constitutes a "job".
 
 The instructions for each such job are contained in a config file in the `YAML 
format`_, accessible with the ``urlwatch --edit`` command.
 If you get an error, set your ``$EDITOR`` (or ``$VISUAL``) environment
@@ -74,6 +74,7 @@
 - ``email`` (using SMTP)
 - email using ``mailgun``
 - ``slack``
+- ``discord``
 - ``pushbullet``
 - ``telegram``
 - ``matrix``
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/urlwatch-2.21/docs/source/jobs.rst 
new/urlwatch-2.22/docs/source/jobs.rst
--- old/urlwatch-2.21/docs/source/jobs.rst      2020-07-31 07:37:07.000000000 
+0200
+++ new/urlwatch-2.22/docs/source/jobs.rst      2020-12-19 12:27:43.000000000 
+0100
@@ -3,7 +3,7 @@
 Jobs
 ====
 
-Jobs are the kind of things that `urlwatch` can monitor. 
+Jobs are the kind of things that `urlwatch` can monitor.
 
 The list of jobs to run are contained in the configuration file ``urls.yaml``,
 accessed with the command ``urlwatch --edit``, each separated by a line
@@ -46,6 +46,7 @@
 - ``ignore_http_error_codes``: List of HTTP errors to ignore (see 
:ref:`advanced_topics`)
 - ``ignore_timeout_errors``: Do not report errors when the timeout is hit
 - ``ignore_too_many_redirects``: Ignore redirect loops (see 
:ref:`advanced_topics`)
+- ``user_visible_url``: Different URL to show in reports (e.g. when watched 
URL is a REST API URL, and you want to show a webpage)
 
 (Note: ``url`` implies ``kind: url``)
 
@@ -80,7 +81,8 @@
 
 Job-specific optional keys:
 
-- none
+- ``wait_until``:  Either ``load``, ``domcontentloaded``, ``networkidle0``, or 
``networkidle2`` (see :ref:`advanced_topics`)
+
 
 As this job uses `Pyppeteer <https://github.com/pyppeteer/pyppeteer>`__
 to render the page in a headless Chromium instance, it requires massively
@@ -98,7 +100,7 @@
 -----
 
 This job type allows you to watch the output of arbitrary shell commands,
-which is useful for e.g. monitoring a FTP uploader folder, output of
+which is useful for e.g. monitoring an FTP uploader folder, output of
 scripts that query external devices (RPi GPIO), etc...
 
 .. code-block:: yaml
@@ -125,6 +127,7 @@
 - ``max_tries``: Number of times to retry fetching the resource
 - ``diff_tool``: Command to a custom tool for generating diff text
 - ``diff_filter``: :ref:`filters` (if any) to apply to the diff result (can be 
tested with ``--test-diff-filter``)
+- ``treat_new_as_changed``: Will treat jobs that don't have any historic data 
as ``CHANGED`` instead of ``NEW`` (and create a diff for new jobs)
 - ``compared_versions``: Number of versions to compare for similarity
 - ``kind`` (redundant): Either ``url``, ``shell`` or ``browser``.  
Automatically derived from the unique key (``url``, ``command`` or 
``navigate``) of the job type
 
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/urlwatch-2.21/docs/source/migration.rst 
new/urlwatch-2.22/docs/source/migration.rst
--- old/urlwatch-2.21/docs/source/migration.rst 2020-07-31 07:37:07.000000000 
+0200
+++ new/urlwatch-2.22/docs/source/migration.rst 2020-12-19 12:27:43.000000000 
+0100
@@ -8,7 +8,7 @@
    specifying names for jobs, different job kinds, directly applying
    filters, selecting the HTTP request method, specifying POST data as
    dictionary and much more
--  The cache directory has been replaced with a SQLite 3 database file
+-  The cache directory has been replaced with an SQLite 3 database file
    ???cache.db??? in `minidb`_ format, storing all change history (use
    ``--gc-cache`` to remove old changes if you don???t need them anymore)
    for further analysis
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/urlwatch-2.21/docs/source/reporters.rst 
new/urlwatch-2.22/docs/source/reporters.rst
--- old/urlwatch-2.21/docs/source/reporters.rst 2020-07-31 07:37:07.000000000 
+0200
+++ new/urlwatch-2.22/docs/source/reporters.rst 2020-12-19 12:27:43.000000000 
+0100
@@ -45,13 +45,16 @@
 
 - **stdout**: Print summary on stdout (the console)
 - **email**: Send summary via e-mail / SMTP
-- **mailgun**: Custom email reporter that uses Mailgun
-- **matrix**: Custom Matrix reporter
+- **mailgun**: Send e-mail via the Mailgun service
+- **matrix**: Send a message to a room using the Matrix protocol
+- **mattermost**: Send a message to a Mattermost channel
 - **pushbullet**: Send summary via pushbullet.com
 - **pushover**: Send summary via pushover.net
-- **slack**: Custom Slack reporter
-- **telegram**: Custom Telegram reporter
+- **slack**: Send a message to a Slack channel
+- **discord**: Send a message to a Discord channel
+- **telegram**: Send a message using Telegram
 - **ifttt**: Send summary via IFTTT
+- **xmpp**: Send a message using the XMPP Protocol
 
 .. To convert the "urlwatch --features" output, use:
    sed -e 's/^  \* \(.*\) - \(.*\)$/- **\1**: \2/'
@@ -141,6 +144,39 @@
 ???Incoming Webhooks??? on a channel, you???ll get a webhook URL, copy it into
 the configuration as seen above.
 
+Mattermost
+----------
+
+Mattermost notifications are set up the same way as Slack notifications,
+the webhook URL is different:
+
+.. code:: yaml
+
+   mattermost:
+     webhook_url: 'http://{your-mattermost-site}/hooks/XXXXXXXXXXXXXXXXXXXXXX'
+     enabled: true
+
+See `Incoming Webooks 
<https://developers.mattermost.com/integrate/incoming-webhooks/>`__
+in the Mattermost documentation for details.
+
+Discord
+-----
+
+Discord notifications are configured using ???Discord Incoming Webhooks???. 
Here
+is a sample configuration:
+
+.. code:: yaml
+
+   discord:
+      webhook_url: 
'https://discordapp.com/api/webhooks/11111XXXXXXXXXXX/BBBBYYYYYYYYYYYYYYYYYYYYYYYyyyYYYYYYYYYYYYYY'
+      enabled: true
+      embed: true
+      subject: '{count} changes: {jobs}'
+      
+To set up Discord, from your Discord Server settings, select Integration and 
then create a "New Webhook", give the webhook a name to post under, select a 
channel, push "Copy Webhook URL" and paste it into the configuration as seen 
above.
+
+Embedded content might be easier to read and identify individual reports. 
subject preceeds the embedded report and is only used when embed is true.
+
 
 IFTTT
 -----
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/urlwatch-2.21/lib/urlwatch/__init__.py 
new/urlwatch-2.22/lib/urlwatch/__init__.py
--- old/urlwatch-2.21/lib/urlwatch/__init__.py  2020-07-31 07:37:07.000000000 
+0200
+++ new/urlwatch-2.22/lib/urlwatch/__init__.py  2020-12-19 12:27:43.000000000 
+0100
@@ -12,5 +12,5 @@
 __author__ = 'Thomas Perl <[email protected]>'
 __license__ = 'BSD'
 __url__ = 'https://thp.io/2008/urlwatch/'
-__version__ = '2.21'
+__version__ = '2.22'
 __user_agent__ = '%s/%s (+https://thp.io/2008/urlwatch/info.html)' % (pkgname, 
__version__)
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/urlwatch-2.21/lib/urlwatch/browser.py 
new/urlwatch-2.22/lib/urlwatch/browser.py
--- old/urlwatch-2.21/lib/urlwatch/browser.py   2020-07-31 07:37:07.000000000 
+0200
+++ new/urlwatch-2.22/lib/urlwatch/browser.py   2020-12-19 12:27:43.000000000 
+0100
@@ -54,16 +54,20 @@
         return browser
 
     @asyncio.coroutine
-    def _get_content(self, url):
+    def _get_content(self, url, wait_until=None):
         context = yield from self._browser.createIncognitoBrowserContext()
         page = yield from context.newPage()
-        yield from page.goto(url)
+        opts = {}
+        if wait_until is not None:
+            opts['waitUntil'] = wait_until
+        yield from page.goto(url, opts)
         content = yield from page.content()
         yield from context.close()
         return content
 
-    def process(self, url):
-        return asyncio.run_coroutine_threadsafe(self._get_content(url), 
self._event_loop).result()
+    def process(self, url, wait_until=None):
+        coroutine = self._get_content(url, wait_until=wait_until)
+        return asyncio.run_coroutine_threadsafe(coroutine, 
self._event_loop).result()
 
     def destroy(self):
         self._event_loop.call_soon_threadsafe(self._event_loop.stop)
@@ -86,8 +90,8 @@
                 BrowserContext._BROWSER_LOOP = BrowserLoop()
             BrowserContext._BROWSER_REFCNT += 1
 
-    def process(self, url):
-        return BrowserContext._BROWSER_LOOP.process(url)
+    def process(self, url, wait_until=None):
+        return BrowserContext._BROWSER_LOOP.process(url, wait_until=wait_until)
 
     def close(self):
         with BrowserContext._BROWSER_LOCK:
@@ -104,13 +108,18 @@
     parser = argparse.ArgumentParser(description='Browser handler')
     parser.add_argument('url', help='URL to retrieve')
     parser.add_argument('-v', '--verbose', action='store_true', help='show 
debug output')
+    parser.add_argument('-w',
+                        '--wait-until',
+                        dest='wait_until',
+                        choices=['load', 'domcontentloaded', 'networkidle0', 
'networkidle2'],
+                        help='When to consider a pageload finished')
     args = parser.parse_args()
 
     setup_logger(args.verbose)
 
     try:
         ctx = BrowserContext()
-        print(ctx.process(args.url))
+        print(ctx.process(args.url, wait_until=args.wait_until))
     finally:
         ctx.close()
 
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/urlwatch-2.21/lib/urlwatch/command.py 
new/urlwatch-2.22/lib/urlwatch/command.py
--- old/urlwatch-2.21/lib/urlwatch/command.py   2020-07-31 07:37:07.000000000 
+0200
+++ new/urlwatch-2.22/lib/urlwatch/command.py   2020-12-19 12:27:43.000000000 
+0100
@@ -60,6 +60,7 @@
                 shutil.copy(self.urlwatch_config.hooks, hooks_edit)
             elif self.urlwatch_config.hooks_py_example is not None and 
os.path.exists(
                     self.urlwatch_config.hooks_py_example):
+                os.makedirs(os.path.dirname(hooks_edit) or '.', exist_ok=True)
                 shutil.copy(self.urlwatch_config.hooks_py_example, hooks_edit)
             edit_file(hooks_edit)
             import_module_from_source('hooks', hooks_edit)
@@ -144,7 +145,12 @@
         job = self._get_job(id)
 
         history_data = 
self.urlwatcher.cache_storage.get_history_data(job.get_guid(), 10)
-        history_data = [key for key, value in sorted(history_data.items(), 
key=lambda kv: kv[1])]
+        history_data = sorted(history_data.items(), key=lambda kv: kv[1])
+
+        if len(history_data) and getattr(job, 'treat_new_as_changed', False):
+            # Insert empty history entry, so first snapshot is diffed against 
the empty string
+            _, first_timestamp = history_data[0]
+            history_data.insert(0, ('', first_timestamp))
 
         if len(history_data) < 2:
             print('Not enough historic data available (need at least 2 
different snapshots)')
@@ -152,8 +158,8 @@
 
         for i in range(len(history_data) - 1):
             with JobState(self.urlwatcher.cache_storage, job) as job_state:
-                job_state.old_data = history_data[i]
-                job_state.new_data = history_data[i + 1]
+                job_state.old_data, job_state.timestamp = history_data[i]
+                job_state.new_data, job_state.current_timestamp = 
history_data[i + 1]
                 print('=== Filtered diff between state {} and state {} 
==='.format(i, i + 1))
                 print(job_state.get_diff())
 
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/urlwatch-2.21/lib/urlwatch/filters.py 
new/urlwatch-2.22/lib/urlwatch/filters.py
--- old/urlwatch-2.21/lib/urlwatch/filters.py   2020-07-31 07:37:07.000000000 
+0200
+++ new/urlwatch-2.22/lib/urlwatch/filters.py   2020-12-19 12:27:43.000000000 
+0100
@@ -46,6 +46,39 @@
 
 from .util import TrackSubClasses, import_module_from_source
 
+from .html2txt import html2text
+from .ical2txt import ical2text
+
+try:
+    from bs4 import BeautifulSoup
+except ImportError:
+    BeautifulSoup = None
+
+try:
+    import jsbeautifier
+except ImportError:
+    jsbeautifier = None
+
+try:
+    import cssbeautifier
+except ImportError:
+    cssbeautifier = None
+
+try:
+    import pdftotext
+except ImportError:
+    pdftotext = None
+
+try:
+    import pytesseract
+except ImportError:
+    pytesseract = None
+
+try:
+    from PIL import Image
+except ImportError:
+    Image = None
+
 logger = logging.getLogger(__name__)
 
 
@@ -82,7 +115,8 @@
             filter_instance = filtercls(state.job, state)
             if filter_instance.match():
                 logger.info('Auto-applying filter %r to %s', filter_instance, 
state.job.get_location())
-                data = filter_instance.filter(data)
+                # filters require a subfilter argument
+                data = filter_instance.filter(data, None)
 
         return data
 
@@ -216,7 +250,10 @@
     def match(self):
         return self.hooks is not None
 
-    def filter(self, data):
+    def filter(self, data, subfilter):
+        if subfilter is not None:
+            logger.warning('Legacy hooks filter does not have any subfilter -- 
ignored')
+
         try:
             result = self.hooks.filter(self.job.get_location(), data)
             if result is None:
@@ -235,14 +272,10 @@
     __no_subfilter__ = True
 
     def filter(self, data, subfilter):
-        from bs4 import BeautifulSoup as bs
-        soup = bs(data, features="lxml")
+        if BeautifulSoup is None:
+            raise ImportError('Please install BeautifulSoup')
 
-        try:
-            import jsbeautifier
-        except ImportError:
-            logger.info('"jsbeautifier" is not installed, will not beautify 
<script> tags')
-            jsbeautifier = None
+        soup = BeautifulSoup(data, features="lxml")
 
         if jsbeautifier is not None:
             scripts = soup.find_all('script')
@@ -250,12 +283,8 @@
                 if script.string is not None:
                     beautified_js = jsbeautifier.beautify(script.string)
                     script.string = beautified_js
-
-        try:
-            import cssbeautifier
-        except ImportError:
-            logger.info('"cssbeautifier" is not installed, will not beautify 
<style> tags')
-            cssbeautifier = None
+        else:
+            logger.info('"jsbeautifier" is not installed, will not beautify 
<script> tags')
 
         if cssbeautifier is not None:
             styles = soup.find_all('style')
@@ -263,6 +292,8 @@
                 if style.string is not None:
                     beautified_css = cssbeautifier.beautify(style.string)
                     style.string = beautified_css
+        else:
+            logger.info('"cssbeautifier" is not installed, will not beautify 
<style> tags')
 
         return soup.prettify()
 
@@ -288,7 +319,6 @@
             method = 're'
             options = {}
 
-        from .html2txt import html2text
         return html2text(data, baseurl=getattr(self.job, 'url', 
getattr(self.job, 'navigate', '')),
                          method=method, options=options)
 
@@ -312,7 +342,9 @@
         if not isinstance(data, bytes):
             raise ValueError('The pdf2text filter needs bytes input (is it the 
first filter?)')
 
-        import pdftotext
+        if pdftotext is None:
+            raise ImportError('Please install pdftotext')
+
         return '\n\n'.join(pdftotext.PDF(io.BytesIO(data), 
password=subfilter.get('password', '')))
 
 
@@ -324,7 +356,6 @@
     __no_subfilter__ = True
 
     def filter(self, data, subfilter):
-        from .ical2txt import ical2text
         return ical2text(data)
 
 
@@ -817,6 +848,10 @@
         language = subfilter.get('language', None)
         timeout = int(subfilter.get('timeout', 10))
 
-        import pytesseract
-        from PIL import Image
+        if pytesseract is None:
+            raise ImportError('Please install pytesseract')
+
+        if Image is None:
+            raise ImportError('Please install Pillow/PIL')
+
         return pytesseract.image_to_string(Image.open(io.BytesIO(data)), 
lang=language, timeout=timeout)
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/urlwatch-2.21/lib/urlwatch/handler.py 
new/urlwatch-2.22/lib/urlwatch/handler.py
--- old/urlwatch-2.21/lib/urlwatch/handler.py   2020-07-31 07:37:07.000000000 
+0200
+++ new/urlwatch-2.22/lib/urlwatch/handler.py   2020-12-19 12:27:43.000000000 
+0100
@@ -55,6 +55,7 @@
         self.new_data = None
         self.history_data = {}
         self.timestamp = None
+        self.current_timestamp = None
         self.exception = None
         self.traceback = None
         self.tries = 0
@@ -103,6 +104,12 @@
         try:
             try:
                 self.load()
+
+                if self.old_data is None and getattr(self.job, 
'treat_new_as_changed', False):
+                    # Force creation of a diff for "NEW"ly found items by 
pretending we had an empty page before
+                    self.old_data = ''
+                    self.timestamp = None
+
                 data = self.job.retrieve(self)
 
                 # Apply automatic filters first
@@ -160,10 +167,10 @@
                     raise subprocess.CalledProcessError(proc.returncode, 
cmdline)
 
         timestamp_old = email.utils.formatdate(self.timestamp, localtime=True)
-        timestamp_new = email.utils.formatdate(time.time(), localtime=True)
-        return 
''.join(difflib.unified_diff(self.old_data.splitlines(keepends=True),
-                                            
self.new_data.splitlines(keepends=True),
-                                            '@', '@', timestamp_old, 
timestamp_new))
+        timestamp_new = email.utils.formatdate(self.current_timestamp or 
time.time(), localtime=True)
+        return '\n'.join(difflib.unified_diff(self.old_data.splitlines(),
+                                              self.new_data.splitlines(),
+                                              '@', '@', timestamp_old, 
timestamp_new, lineterm=''))
 
 
 class Report(object):
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/urlwatch-2.21/lib/urlwatch/html2txt.py 
new/urlwatch-2.22/lib/urlwatch/html2txt.py
--- old/urlwatch-2.21/lib/urlwatch/html2txt.py  2020-07-31 07:37:07.000000000 
+0200
+++ new/urlwatch-2.22/lib/urlwatch/html2txt.py  2020-12-19 12:27:43.000000000 
+0100
@@ -35,6 +35,16 @@
 
 logger = logging.getLogger(__name__)
 
+try:
+    import html2text as pyhtml2text
+except ImportError:
+    pyhtml2text = None
+
+try:
+    from bs4 import BeautifulSoup
+except ImportError:
+    BeautifulSoup = None
+
 
 def html2text(data, baseurl, method, options):
     """
@@ -59,8 +69,10 @@
         return d
 
     if method == 'pyhtml2text':
-        import html2text
-        parser = html2text.HTML2Text()
+        if pyhtml2text is None:
+            raise ImportError('Please install pyhtml2text')
+
+        parser = pyhtml2text.HTML2Text()
         parser.baseurl = baseurl
         for k, v in options.items():
             setattr(parser, k.lower(), v)
@@ -68,7 +80,8 @@
         return d
 
     if method == 'bs4':
-        from bs4 import BeautifulSoup
+        if BeautifulSoup is None:
+            raise ImportError('Please install BeautifulSoup')
         parser = options.pop('parser', 'lxml')
         soup = BeautifulSoup(data, parser)
         d = soup.get_text(strip=True)
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/urlwatch-2.21/lib/urlwatch/ical2txt.py 
new/urlwatch-2.22/lib/urlwatch/ical2txt.py
--- old/urlwatch-2.21/lib/urlwatch/ical2txt.py  2020-07-31 07:37:07.000000000 
+0200
+++ new/urlwatch-2.22/lib/urlwatch/ical2txt.py  2020-12-19 12:27:43.000000000 
+0100
@@ -28,8 +28,16 @@
 # THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 
 
-def ical2text(ical_string):
+try:
     import vobject
+except ImportError:
+    vobject = None
+
+
+def ical2text(ical_string):
+    if vobject is None:
+        raise ImportError('Please install vobject')
+
     result = []
     if isinstance(ical_string, str):
         parsedCal = vobject.readOne(ical_string)
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/urlwatch-2.21/lib/urlwatch/jobs.py 
new/urlwatch-2.22/lib/urlwatch/jobs.py
--- old/urlwatch-2.21/lib/urlwatch/jobs.py      2020-07-31 07:37:07.000000000 
+0200
+++ new/urlwatch-2.22/lib/urlwatch/jobs.py      2020-12-19 12:27:43.000000000 
+0100
@@ -180,7 +180,7 @@
 
 class Job(JobBase):
     __required__ = ()
-    __optional__ = ('name', 'filter', 'max_tries', 'diff_tool', 
'compared_versions', 'diff_filter')
+    __optional__ = ('name', 'filter', 'max_tries', 'diff_tool', 
'compared_versions', 'diff_filter', 'treat_new_as_changed')
 
     # determine if hyperlink "a" tag is used in HtmlReporter
     LOCATION_IS_URL = False
@@ -221,13 +221,13 @@
     __required__ = ('url',)
     __optional__ = ('cookies', 'data', 'method', 'ssl_no_verify', 
'ignore_cached', 'http_proxy', 'https_proxy',
                     'headers', 'ignore_connection_errors', 
'ignore_http_error_codes', 'encoding', 'timeout',
-                    'ignore_timeout_errors', 'ignore_too_many_redirects')
+                    'ignore_timeout_errors', 'ignore_too_many_redirects', 
'user_visible_url')
 
     LOCATION_IS_URL = True
     CHARSET_RE = re.compile('text/(html|plain); charset=([^;]*)')
 
     def get_location(self):
-        return self.url
+        return self.user_visible_url or self.url
 
     def retrieve(self, job_state):
         headers = {
@@ -364,6 +364,8 @@
 
     __required__ = ('navigate',)
 
+    __optional__ = ('wait_until',)
+
     LOCATION_IS_URL = True
 
     def get_location(self):
@@ -377,4 +379,4 @@
         self.ctx.close()
 
     def retrieve(self, job_state):
-        return self.ctx.process(self.navigate)
+        return self.ctx.process(self.navigate, wait_until=self.wait_until)
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/urlwatch-2.21/lib/urlwatch/main.py 
new/urlwatch-2.22/lib/urlwatch/main.py
--- old/urlwatch-2.21/lib/urlwatch/main.py      2020-07-31 07:37:07.000000000 
+0200
+++ new/urlwatch-2.22/lib/urlwatch/main.py      2020-12-19 12:27:43.000000000 
+0100
@@ -69,8 +69,6 @@
             self.urlwatch_config.migrate_cache(self)
 
     def check_directories(self):
-        if not os.path.isdir(self.urlwatch_config.urlwatch_dir):
-            os.makedirs(self.urlwatch_config.urlwatch_dir)
         if not os.path.exists(self.urlwatch_config.config):
             
self.config_storage.write_default_config(self.urlwatch_config.config)
             print("""
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/urlwatch-2.21/lib/urlwatch/reporters.py 
new/urlwatch-2.22/lib/urlwatch/reporters.py
--- old/urlwatch-2.21/lib/urlwatch/reporters.py 2020-07-31 07:37:07.000000000 
+0200
+++ new/urlwatch-2.22/lib/urlwatch/reporters.py 2020-12-19 12:27:43.000000000 
+0100
@@ -67,6 +67,11 @@
 except ImportError:
     Markdown = None
 
+try:
+    from colorama import AnsiToWin32
+except ImportError:
+    AnsiToWin32 = None
+
 logger = logging.getLogger(__name__)
 
 # Regular expressions that match the added/removed markers of GNU wdiff output
@@ -203,7 +208,11 @@
             elif line.startswith('-'):
                 yield SafeHtml('<span 
class="unified_sub">{line}</span>').format(line=line)
             else:
-                yield SafeHtml('<span 
class="unified_nor">{line}</span>').format(line=line)
+                # Basic colorization for wdiff-style differences
+                line = SafeHtml('<span 
class="unified_nor">{line}</span>').format(line=line)
+                line = re.sub(WDIFF_ADDED_RE, lambda x: '<span 
class="diff_add">' + x.group(0) + '</span>', line)
+                line = re.sub(WDIFF_REMOVED_RE, lambda x: '<span 
class="diff_sub">' + x.group(0) + '</span>', line)
+                yield line
 
     def _format_content(self, job_state, difftype):
         if job_state.verb == 'error':
@@ -336,8 +345,7 @@
         return self._incolor(4, s)
 
     def _get_print(self):
-        if sys.platform == 'win32' and self._has_color:
-            from colorama import AnsiToWin32
+        if sys.platform == 'win32' and self._has_color and AnsiToWin32 is not 
None:
             return functools.partial(print, 
file=AnsiToWin32(sys.stdout).stream)
         return print
 
@@ -610,44 +618,113 @@
 
 class SlackReporter(TextReporter):
     """Send a message to a Slack channel"""
-    MAX_LENGTH = 40000
 
     __kind__ = 'slack'
 
+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+        self.max_length = self.config.get('max_message_length', 40000)
+
     def submit(self):
         webhook_url = self.config['webhook_url']
         text = '\n'.join(super().submit())
 
         if not text:
-            logger.debug('Not calling slack API (no changes)')
+            logger.debug('Not calling {} API (no 
changes)'.format(self.__kind__))
             return
 
         result = None
-        for chunk in chunkstring(text, self.MAX_LENGTH, numbering=True):
-            res = self.submit_to_slack(webhook_url, chunk)
+        for chunk in chunkstring(text, self.max_length, numbering=True):
+            res = self.submit_chunk(webhook_url, chunk)
             if res.status_code != requests.codes.ok or res is None:
                 result = res
 
         return result
 
-    def submit_to_slack(self, webhook_url, text):
-        logger.debug("Sending slack request with text:{0}".format(text))
+    def submit_chunk(self, webhook_url, text):
+        logger.debug("Sending {} request with text: {}".format(self.__kind__, 
text))
         post_data = {"text": text}
         result = requests.post(webhook_url, json=post_data)
         try:
             if result.status_code == requests.codes.ok:
-                logger.info("Slack response: ok")
+                logger.info("{} response: ok".format(self.__kind__))
             else:
-                logger.error("Slack error: {0}".format(result.text))
+                logger.error("{} error: {}".format(self.__kind__, result.text))
         except ValueError:
             logger.error(
-                "Failed to parse slack response. HTTP status code: {0}, 
content: {1}".format(result.status_code,
-                                                                               
              result.content))
+                "Failed to parse {} response. HTTP status code: {}, content: 
{}".format(self.__kind__,
+                                                                               
         result.status_code,
+                                                                               
         result.content))
         return result
 
 
-class MarkdownReporter(ReporterBase):
+class MattermostReporter(SlackReporter):
+    """Send a message to a Mattermost channel"""
+
+    __kind__ = 'mattermost'
+
+
+class DiscordReporter(TextReporter):
+    """Send a message to a Discord channel"""
+
+    __kind__ = 'discord'
+
+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+        self.max_length = self.config.get('max_message_length', 2000)
+
     def submit(self):
+        webhook_url = self.config['webhook_url']
+        text = '\n'.join(super().submit())
+
+        if not text:
+            logger.debug('Not calling Discord API (no changes)')
+            return
+
+        result = None
+        for chunk in chunkstring(text, self.max_length, numbering=True):
+            res = self.submit_to_discord(webhook_url, chunk)
+            if res.status_code != requests.codes.ok or res is None:
+                result = res
+
+        return result
+
+    def submit_to_discord(self, webhook_url, text):
+        if self.config.get('embed', False):
+            filtered_job_states = 
list(self.report.get_filtered_job_states(self.job_states))
+
+            subject_args = {
+                'count': len(filtered_job_states),
+                'jobs': ', '.join(job_state.job.pretty_name() for job_state in 
filtered_job_states),
+            }
+
+            subject = self.config['subject'].format(**subject_args)
+
+            post_data = {
+                'content': subject,
+                'embeds': [{
+                    'type': 'rich',
+                    'description': text,
+                }]
+            }
+        else:
+            post_data = {"content": text}
+
+        logger.debug("Sending Discord request with post_data: 
{0}".format(post_data))
+
+        result = requests.post(webhook_url, json=post_data)
+        try:
+            if result.status_code in (requests.codes.ok, 
requests.codes.no_content):
+                logger.info("Discord response: ok")
+            else:
+                logger.error("Discord error: {0}".format(result.text))
+        except ValueError:
+            logger.error("Failed to parse Discord response. HTTP status code: 
{0}, content: {1}".format(result.status_code, result.content))
+        return result
+
+
+class MarkdownReporter(ReporterBase):
+    def submit(self, max_length=None):
         cfg = self.report.config['report']['markdown']
         show_details = cfg['details']
         show_footer = cfg['footer']
@@ -668,18 +745,144 @@
             summary.extend(summary_part)
             details.extend(details_part)
 
+        if summary and show_footer:
+            footer = ('--- ',
+                      '%s %s, %s  ' % (urlwatch.pkgname, urlwatch.__version__, 
urlwatch.__copyright__),
+                      'Website: %s  ' % (urlwatch.__url__,),
+                      'watched %d URLs in %d seconds' % (len(self.job_states), 
self.duration.seconds))
+        else:
+            footer = None
+
+        if not show_details:
+            details = None
+
+        trimmed_msg = "*Parts of the report were omitted due to message 
length.*\n"
+        max_length -= len(trimmed_msg)
+
+        trimmed, summary, details, footer = MarkdownReporter._render(
+            max_length, summary, details, footer
+        )
+
         if summary:
-            yield from ('%d. %s' % (idx + 1, line) for idx, line in 
enumerate(summary))
+            yield from summary
             yield ''
 
         if show_details:
-            yield from details
+            for header, body in details:
+                yield header
+                yield body
+                yield ''
+
+        if trimmed:
+            yield trimmed_msg
 
         if summary and show_footer:
-            yield from ('--- ',
-                        '%s %s, %s  ' % (urlwatch.pkgname, 
urlwatch.__version__, urlwatch.__copyright__),
-                        'Website: %s  ' % (urlwatch.__url__,),
-                        'watched %d URLs in %d seconds' % 
(len(self.job_states), self.duration.seconds))
+            yield from footer
+
+    @classmethod
+    def _render(cls, max_length, summary=None, details=None, footer=None):
+        """Render the report components, trimming them if the available length 
is insufficient.
+
+        Returns a tuple (trimmed, summary, details, footer).
+
+        The first element of the tuple indicates whether any part of the report
+        was omitted due to message length. The other elements are the
+        potentially trimmed report components.
+        """
+
+        # The footer/summary lengths are the sum of the length of their parts
+        # plus the space taken up by newlines.
+        if summary:
+            summary = ['%d. %s' % (idx + 1, line) for idx, line in 
enumerate(summary)]
+            summary_len = sum(len(part) for part in summary) + len(summary) - 1
+        else:
+            summary_len = 0
+
+        if footer:
+            footer_len = sum(len(part) for part in footer) + len(footer) - 1
+        else:
+            footer_len = 0
+
+        if max_length is None:
+            return (False, summary, details, footer)
+        else:
+            if summary_len > max_length:
+                return (True, [], [], "")
+            elif footer_len > max_length - summary_len:
+                return (True, summary, [], footer[:max_length - summary_len])
+            elif not details:
+                return (False, summary, [], footer)
+            else:
+                # Determine the space remaining after taking into account
+                # summary and footer.
+                remaining_len = max_length - summary_len - footer_len
+                headers_len = sum(len(header) for header, _ in details)
+
+                details_trimmed = False
+
+                # First ensure we can show all the headers.
+                if headers_len > remaining_len:
+                    return (True, summary, [], footer)
+                else:
+                    remaining_len -= headers_len
+
+                    # Calculate approximate available length per item, shared
+                    # equally between all details components.
+                    body_len_per_details = remaining_len // len(details)
+
+                    trimmed_details = []
+                    unprocessed = len(details)
+
+                    for header, body in details:
+                        # Calculate the available length for the body and 
render it
+                        avail_length = body_len_per_details - 1
+
+                        body_trimmed, body = cls._format_details_body(body, 
avail_length)
+
+                        if body_trimmed:
+                            details_trimmed = True
+
+                        if len(body) <= body_len_per_details:
+                            trimmed_details.append((header, body))
+                        else:
+                            trimmed_details.append((header, ""))
+
+                        # If the current item's body did not use all of its
+                        # allocated space, distribute the unused space into
+                        # subsequent items, unless we're at the last item
+                        # already.
+                        unused = body_len_per_details - len(body)
+                        remaining_len -= body_len_per_details
+                        remaining_len += unused
+                        unprocessed -= 1
+
+                        if unprocessed > 0:
+                            body_len_per_details = remaining_len // unprocessed
+
+                    return (details_trimmed, summary, trimmed_details, footer)
+
+    @staticmethod
+    def _format_details_body(s, max_length):
+        wrapper_length = len("```diff\n\n```")
+
+        # Message to print when the diff is too long.
+        trim_message = "*diff trimmed*"
+        trim_message_length = len(trim_message)
+
+        if max_length is None or len(s) + wrapper_length <= max_length:
+            return False, "```diff\n{}\n```".format(s)
+        else:
+            target_max_length = max_length - trim_message_length - 
wrapper_length
+            pos = s.rfind("\n", 0, target_max_length)
+
+            if pos == -1:
+                # Just a single long line, so cut it short.
+                s = s[0:target_max_length]
+            else:
+                # Multiple lines, cut off extra lines.
+                s = s[0:pos]
+
+            return True, "{}\n```diff\n{}\n```".format(trim_message, s)
 
     def _format_content(self, job_state):
         if job_state.verb == 'error':
@@ -708,17 +911,15 @@
 
         summary_part.append(pretty_summary)
 
-        details_part.append('### ' + summary)
         if content is not None:
-            details_part.extend(('', '```', content, '```', ''))
-        details_part.extend(('', ''))
+            details_part.append(('### ' + summary, content))
 
         return summary_part, details_part
 
 
 class MatrixReporter(MarkdownReporter):
     """Send a message to a room using the Matrix protocol"""
-    MAX_LENGTH = 4096
+    MAX_LENGTH = 16384
 
     __kind__ = 'matrix'
 
@@ -730,19 +931,16 @@
         access_token = self.config['access_token']
         room_id = self.config['room_id']
 
-        body_markdown = '\n'.join(super().submit())
+        body_markdown = '\n'.join(super().submit(MatrixReporter.MAX_LENGTH))
 
         if not body_markdown:
             logger.debug('Not calling Matrix API (no changes)')
             return
 
-        if len(body_markdown) > self.MAX_LENGTH:
-            body_markdown = body_markdown[:self.MAX_LENGTH]
-
         client_api = matrix_client.api.MatrixHttpApi(homeserver_url, 
access_token)
 
         if Markdown is not None:
-            body_html = Markdown().convert(body_markdown)
+            body_html = Markdown(extras=["fenced-code-blocks", 
"highlightjs-lang"]).convert(body_markdown)
 
             client_api.send_message_event(
                 room_id,
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/urlwatch-2.21/lib/urlwatch/storage.py 
new/urlwatch-2.22/lib/urlwatch/storage.py
--- old/urlwatch-2.21/lib/urlwatch/storage.py   2020-07-31 07:37:07.000000000 
+0200
+++ new/urlwatch-2.22/lib/urlwatch/storage.py   2020-12-19 12:27:43.000000000 
+0100
@@ -32,6 +32,7 @@
 import stat
 import copy
 import platform
+import collections
 from abc import ABCMeta, abstractmethod
 
 import shutil
@@ -49,6 +50,11 @@
 except ImportError:
     redis = None
 
+try:
+    import pwd
+except ImportError:
+    pwd = None
+
 from .util import atomic_rename, edit_file
 from .jobs import JobBase, UrlJob, ShellJob
 from .filters import FilterBase
@@ -124,6 +130,19 @@
         'slack': {
             'enabled': False,
             'webhook_url': '',
+            'max_message_length': 40000,
+        },
+        'mattermost': {
+            'enabled': False,
+            'webhook_url': '',
+            'max_message_length': 40000,
+        },
+        'discord': {
+            'enabled': False,
+            'embed': False,
+            'subject': '{count} changes: {jobs}',
+            'webhook_url': '',
+            'max_message_length': 2000,
         },
         'matrix': {
             'enabled': False,
@@ -182,7 +201,6 @@
         # If there is no controlling terminal, because urlwatch is launched by
         # cron, or by a systemd.service for example, os.getlogin() fails with:
         # OSError: [Errno 25] Inappropriate ioctl for device
-        import pwd
         return pwd.getpwuid(os.getuid()).pw_name
 
 
@@ -219,6 +237,7 @@
         if os.path.exists(self.filename):
             shutil.copy(self.filename, file_edit)
         elif example_file is not None and os.path.exists(example_file):
+            os.makedirs(os.path.dirname(file_edit) or '.', exist_ok=True)
             shutil.copy(example_file, file_edit)
 
         while True:
@@ -249,6 +268,7 @@
 
     @classmethod
     def write_default_config(cls, filename):
+        os.makedirs(os.path.dirname(filename) or '.', exist_ok=True)
         config_storage = cls(None)
         config_storage.filename = filename
         config_storage.save()
@@ -349,14 +369,31 @@
 
 
 class UrlsYaml(BaseYamlFileStorage, UrlsBaseFileStorage):
+    @classmethod
+    def _parse(cls, fp):
+        jobs = [JobBase.unserialize(job) for job in yaml.load_all(fp, 
Loader=yaml.SafeLoader)
+                if job is not None]
+        jobs_by_guid = collections.defaultdict(list)
+        for job in jobs:
+            jobs_by_guid[job.get_guid()].append(job)
+
+        conflicting_jobs = []
+        for guid, guid_jobs in jobs_by_guid.items():
+            if len(guid_jobs) != 1:
+                conflicting_jobs.append(guid_jobs[0].get_location())
+
+        if conflicting_jobs:
+            raise ValueError('\n   '.join(['Each job must have a unique URL, 
append #1, #2, ... to make them unique:']
+                                          + conflicting_jobs))
+
+        return jobs
 
     @classmethod
     def parse(cls, *args):
         filename = args[0]
         if filename is not None and os.path.exists(filename):
             with open(filename) as fp:
-                return [JobBase.unserialize(job) for job in yaml.load_all(fp, 
Loader=yaml.SafeLoader)
-                        if job is not None]
+                return cls._parse(fp)
 
     def save(self, *args):
         jobs = args[0]
@@ -367,7 +404,7 @@
 
     def load(self, *args):
         with open(self.filename) as fp:
-            return [JobBase.unserialize(job) for job in yaml.load_all(fp, 
Loader=yaml.SafeLoader) if job is not None]
+            return self._parse(fp)
 
 
 class UrlsTxt(BaseTxtFileStorage, UrlsBaseFileStorage):
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/urlwatch-2.21/lib/urlwatch/tests/test_handler.py 
new/urlwatch-2.22/lib/urlwatch/tests/test_handler.py
--- old/urlwatch-2.21/lib/urlwatch/tests/test_handler.py        2020-07-31 
07:37:07.000000000 +0200
+++ new/urlwatch-2.22/lib/urlwatch/tests/test_handler.py        2020-12-19 
12:27:43.000000000 +0100
@@ -1,7 +1,6 @@
 import sys
 from glob import glob
 
-import pycodestyle as pycodestyle
 from urlwatch.jobs import UrlJob, JobBase, ShellJob
 from urlwatch.storage import UrlsYaml, UrlsTxt
 
@@ -80,6 +79,7 @@
 
 def test_pep8_conformance():
     """Test that we conform to PEP-8."""
+    import pycodestyle
     style = pycodestyle.StyleGuide(ignore=['E501', 'E402', 'W503'])
 
     py_files = [y for x in os.walk(os.path.abspath('.')) for y in 
glob(os.path.join(x[0], '*.py'))]
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/urlwatch-2.21/setup.py new/urlwatch-2.22/setup.py
--- old/urlwatch-2.21/setup.py  2020-07-31 07:37:07.000000000 +0200
+++ new/urlwatch-2.22/setup.py  2020-12-19 12:27:43.000000000 +0100
@@ -10,8 +10,8 @@
 m = dict(re.findall("\n__([a-z]+)__ = '([^']+)'", main_py))
 docs = re.findall('"""(.*?)"""', main_py, re.DOTALL)
 
-if sys.version_info < (3, 3):
-    sys.exit('urlwatch requires Python 3.3 or newer')
+if sys.version_info < (3, 6):
+    sys.exit('urlwatch requires Python 3.6 or newer')
 
 m['name'] = 'urlwatch'
 m['author'], m['author_email'] = re.match(r'(.*) <(.*)>', m['author']).groups()
@@ -22,7 +22,7 @@
 m['entry_points'] = {"console_scripts": ["urlwatch=urlwatch.cli:main"]}
 m['package_dir'] = {'': 'lib'}
 m['packages'] = ['urlwatch']
-m['python_requires'] = '>=3.5'
+m['python_requires'] = '>=3.6'
 m['data_files'] = [
     ('share/man/man1', ['share/man/man1/urlwatch.1']),
     ('share/urlwatch/examples', [
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' 
old/urlwatch-2.21/share/urlwatch/examples/hooks.py.example 
new/urlwatch-2.22/share/urlwatch/examples/hooks.py.example
--- old/urlwatch-2.21/share/urlwatch/examples/hooks.py.example  2020-07-31 
07:37:07.000000000 +0200
+++ new/urlwatch-2.22/share/urlwatch/examples/hooks.py.example  2020-12-19 
12:27:43.000000000 +0100
@@ -49,7 +49,7 @@
 #
 #    __kind__ = 'case'
 #
-#    def filter(self, data, subfilter=None):
+#    def filter(self, data, subfilter):
 #        # The subfilter is specified using a colon, for example the "case"
 #        # filter here can be specified as "case:upper" and "case:lower"
 #
@@ -69,7 +69,7 @@
 #
 #    __kind__ = 'indent'
 #
-#    def filter(self, data, subfilter=None):
+#    def filter(self, data, subfilter):
 #        # The subfilter here is a number of characters to indent
 #
 #        if subfilter is None:
@@ -87,7 +87,7 @@
     MATCH = {'url': 'http://example.org/'}
 
     # An auto-match filter does not have any subfilters
-    def filter(self, data):
+    def filter(self, data, subfilter):
         return data.replace('foo', 'bar')
 
 class CustomRegexMatchUrlFilter(filters.RegexMatchFilter):
@@ -95,7 +95,7 @@
     MATCH = {'url': re.compile('http://example.org/.*')}
 
     # An auto-match filter does not have any subfilters
-    def filter(self, data):
+    def filter(self, data, subfilter):
         return data.replace('foo', 'bar')
 
 
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' 
old/urlwatch-2.21/share/urlwatch/examples/urls.yaml.example 
new/urlwatch-2.22/share/urlwatch/examples/urls.yaml.example
--- old/urlwatch-2.21/share/urlwatch/examples/urls.yaml.example 2020-07-31 
07:37:07.000000000 +0200
+++ new/urlwatch-2.22/share/urlwatch/examples/urls.yaml.example 2020-12-19 
12:27:43.000000000 +0100
@@ -38,7 +38,9 @@
 ---
 # You can do POST requests by providing data parameter.
 # POST data can be a URL-encoded string (see last example) or a dict.
-url: "http://example.com/search.cgi";
+# If you are using the URL multiple times, you need to append "#something" for
+# each different job, so that the URL string still uniquely identifies the job
+url: "http://example.com/search.cgi#alternative";
 data:
   button: Search
   q: something

commit urlwatch for openSUSE:Factory

Reply via email to