Script 'mail_helper' called by obssrc Hello community, here is the log from the commit of package urlscan for openSUSE:Factory checked in at 2021-05-17 18:45:05 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Comparing /work/SRC/openSUSE:Factory/urlscan (Old) and /work/SRC/openSUSE:Factory/.urlscan.new.2988 (New) ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Package is "urlscan" Mon May 17 18:45:05 2021 rev:8 rq:893552 version:0.9.6 Changes: -------- --- /work/SRC/openSUSE:Factory/urlscan/urlscan.changes 2020-08-07 14:22:37.590321213 +0200 +++ /work/SRC/openSUSE:Factory/.urlscan.new.2988/urlscan.changes 2021-05-17 18:45:22.416608611 +0200 @@ -1,0 +2,14 @@ +Wed May 12 22:03:24 UTC 2021 - Dirk M??ller <[email protected]> + +- update to 0.9.6: + * Python 3.6+ required + * Convert to newer email.message.EmailMessage format for processing. Closes #98 + * Hopefully fix #105. Escapes every "&" in the URL + * Attempt --run-safe implementation + * Fixes #106 + * Scan a selection of email headers for URLs. Closes #97. + * Add option for custom regex. Closes #79. + * Allow $ as an acceptable trailing character + * Fix urwid reverse error. Thanks to @pavoljuhas. Closes #99 + +------------------------------------------------------------------- Old: ---- urlscan-0.9.5.tar.gz New: ---- urlscan-0.9.6.tar.gz ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Other differences: ------------------ ++++++ urlscan.spec ++++++ --- /var/tmp/diff_new_pack.fhOftc/_old 2021-05-17 18:45:23.068605845 +0200 +++ /var/tmp/diff_new_pack.fhOftc/_new 2021-05-17 18:45:23.068605845 +0200 @@ -1,7 +1,7 @@ # # spec file for package urlscan # -# Copyright (c) 2020 SUSE LLC +# Copyright (c) 2021 SUSE LLC # # All modifications and additions to the file contributed by third parties # remain the property of their copyright owners, unless otherwise agreed @@ -16,8 +16,9 @@ # +%define python_flavor python3 Name: urlscan -Version: 0.9.5 +Version: 0.9.6 Release: 0 Summary: An other URL extractor/viewer License: GPL-2.0-or-later @@ -25,16 +26,14 @@ URL: https://github.com/firecat53/urlscan Source0: https://github.com/firecat53/urlscan/archive/%{version}.tar.gz#/%{name}-%{version}.tar.gz Source1: muttrc -Requires: python3 -Requires: python3-base -Requires: python3-urwid BuildRequires: python3-base BuildRequires: python3-devel BuildRequires: python3-rpm-macros BuildRequires: python3-setuptools -BuildRoot: %{_tmppath}/%{name}-%{version}-build +Requires: python3 +Requires: python3-base +Requires: python3-urwid BuildArch: noarch -%define python_flavor python3 %description The urlscan utility displays URLs found in an email message with @@ -50,18 +49,17 @@ %install python3 setup.py install --prefix=%{_prefix} --root=%{buildroot} -rm -rf %{buildroot}/usr/share/doc/%{name}* +rm -rf %{buildroot}%{_datadir}/doc/%{name}* mkdir -p %{buildroot}%{_defaultdocdir}/%{name} -install -m 0644 %{S:1} %{buildroot}%{_defaultdocdir}/%{name} +install -m 0644 %{SOURCE1} %{buildroot}%{_defaultdocdir}/%{name} rm -rvf %{buildroot}%{python_sitelib}/%{name}-%{version}-*-info %files -%defattr(-,root,root) %license COPYING %doc README.rst %{_bindir}/%{name} %{python_sitelib}/%{name} -%{_mandir}/man1/%{name}.1.gz +%{_mandir}/man1/%{name}.1%{?ext_man} %doc %{_defaultdocdir}/%{name}/muttrc %changelog ++++++ urlscan-0.9.5.tar.gz -> urlscan-0.9.6.tar.gz ++++++ diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/urlscan-0.9.5/README.rst new/urlscan-0.9.6/README.rst --- old/urlscan-0.9.5/README.rst 2020-07-09 19:25:47.000000000 +0200 +++ new/urlscan-0.9.6/README.rst 2021-03-23 06:00:17.000000000 +0100 @@ -19,7 +19,7 @@ *NOTE* The last version that is Python 2 compatible is 0.9.3. -Requires: Python 3.3+ and the python-urwid library +Requires: Python 3.6+ and the python-urwid library Features -------- @@ -50,7 +50,7 @@ - Use `l` to cycle through whether URLs are opened using the Python webbrowser module (default), xdg-open (if installed) or opened by a function passed on - the command line with `--run`. + the command line with `--run` or `--run-safe`. - Configure colors and keybindings via ~/.config/urlscan/config.json. Generate default config file for editing by running `urlscan -g`. Cycle through @@ -64,6 +64,13 @@ - Show complete help menu with `F1`. Hide header on startup with `--nohelp`. +- Use a custom regular expression with `-E` for matching urls or any + other pattern. In junction with `-r`, this effectively turns urlscan + into a general purpose CLI selector-type utility. + +- Scan certain email headers for URLs. Currently `Link`, `Archived-At` and + `List-*` are scanned when `--headers` is passed. + Installation and setup ---------------------- @@ -102,7 +109,7 @@ :: - urlscan [-g, --genconf] [-n, --no-browser] [-c, --compact] [-d, --dedupe] [-r, --run <expression>] [-R, --reverse] [-s, --single] [-p, --pipe] [-w, --width] [-H, --nohelp] <file> + urlscan [-g, --genconf] [-n, --no-browser] [-c, --compact] [-d, --dedupe] [--headers] [-r, --run <expression>] [-f, --run-safe <expression>] [-R, --reverse] [-s, --single] [-p, --pipe] [-w, --width] [-H, --nohelp] [-E, --regex <expression>] <file> Urlscan can extract URLs and email addresses from emails or any text file. Calling with no flags will start the curses browser. Calling with '-n' will just @@ -113,11 +120,11 @@ urlscan` or `urlscan < <something>` Instead of opening a web browser, the selected URL can be passed as the argument -to a command using `--run "<command> {}"`. Note the use of `{}` in the command -string to denote the selected URL. Alternatively, the URL can be piped to the -command using `--run <command> --pipe`. Using --run with --pipe is preferred if -the command supports it, as it is marginally more secure and tolerant of special -characters in the URL. +to a command using `--run-safe "<command> {}"` or `--run "<command> {}"`. Note +the use of `{}` in the command string to denote the selected URL. Alternatively, +the URL can be piped to the command using `--run-safe <command> --pipe` (or +`--run`). Using --run-safe with --pipe is preferred if the command supports it, +as it is marginally more secure and tolerant of special characters in the URL. Theming ------- @@ -148,7 +155,7 @@ - `context` -- show/hide context (default: `c`) - `down` -- cursor down (default: `j`) - `help_menu` -- show/hide help menu (default: `F1`) -- `link_handler` -- cycle link handling (webbrowser, xdg-open or --run) (default: `l`) +- `link_handler` -- cycle link handling (webbrowser, xdg-open, --run-safe or --run) (default: `l`) - `open_url` -- open selected URL (default: `space` or `enter`) - `palette` -- cycle through palettes (default: `p`) - `quit` -- quit (default: `q` or `Q`) diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/urlscan-0.9.5/bin/urlscan new/urlscan-0.9.6/bin/urlscan --- old/urlscan-0.9.5/bin/urlscan 2020-07-09 19:25:47.000000000 +0200 +++ new/urlscan-0.9.6/bin/urlscan 2021-03-23 06:00:17.000000000 +0100 @@ -1,11 +1,11 @@ #!/usr/bin/env python3 """ A simple urlview replacement that handles things like quoted-printable -properly. aka "urlview minus teh suck" +properly. """ # # Copyright (C) 2006-2007 Daniel Burrows -# Copyright (C) 2019 Scott Hansen +# Copyright (C) 2021 Scott Hansen # # This program is free software; you can redistribute it and/or # modify it under the terms of the GNU General Public License @@ -21,17 +21,13 @@ # along with this program; if not, write to the Free Software # Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA -from __future__ import unicode_literals import argparse import io -import locale import os import sys +from email import policy +from email.parser import BytesParser from urlscan import urlchoose, urlscan -try: - from email.Parser import Parser as parser -except ImportError: - from email.parser import Parser as parser def parse_arguments(): @@ -56,14 +52,25 @@ arg_parse.add_argument('--dedupe', '-d', dest="dedupe", action='store_true', default=False, help="Remove duplicate URLs from list") + arg_parse.add_argument('--regex', '-E', + help="Alternate custom regex to be used for all " + "kinds of matching. " + "For example: --regex 'https?://.+\.\w+'") arg_parse.add_argument('--run', '-r', help="Alternate command to run on selected URL " "instead of opening URL in browser. Use {} to " "represent the URL value in the expression. " "For example: --run 'echo {} | xclip -i'") + arg_parse.add_argument('--run-safe', '-f', dest="runsafe", + help="Alternate command to run on selected URL " + "instead of opening URL in browser. Use {} to " + "represent the URL value in the expression. Safest " + "run option but uses `shell=False` which does not " + "allow use of shell features like | or >. Can use " + "with --pipe.") arg_parse.add_argument('--pipe', '-p', dest='pipe', action='store_true', default=False, - help='Pipe URL into the command specified by --run') + help="Pipe URL into the command specified by --run or --run-safe") arg_parse.add_argument('--nohelp', '-H', dest='nohelp', action='store_true', default=False, help='Hide help menu by default') @@ -73,6 +80,9 @@ arg_parse.add_argument('--width', '-w', dest='width', type=int, default=0, help='Set width to display') + arg_parse.add_argument('--headers', dest='headers', + action='store_true', default=False, + help='Scan certain message headers for URLs.') arg_parse.add_argument('message', nargs='?', default=sys.stdin, help="Filename of the message to parse") return arg_parse.parse_args() @@ -98,16 +108,9 @@ file encoding differences. Args: fname - filename or sys.stdin - Returns: mesg - parsed (email parser) text of the message with the - correct encoding set + Returns: mesg - EmailMessage object """ - enc_list = ['UTF-8', 'LATIN-1', 'iso8859-1', 'iso8859-2', - 'UTF-16', 'CP720', 'CP437'] - locale.setlocale(locale.LC_ALL, '') - code = locale.getpreferredencoding() - if code not in enc_list: - enc_list.insert(0, code) if fname is sys.stdin: try: stdin_file = fname.buffer.read() @@ -115,34 +118,23 @@ stdin_file = fname.read() else: stdin_file = None - for enc in enc_list: - try: - if stdin_file is not None: - fobj = io.StringIO(stdin_file.decode(enc)) - else: - fobj = io.open(fname, mode='r', encoding=(enc)) - f_keep = fobj - mesg = parser().parse(fobj) - if 'From' not in mesg.keys() and 'Date' not in mesg.keys(): - # If it's not an email message, don't let the email parser - # delete the first line. If it is, let the parser do its job so - # we don't get mailto: links for all the To and From addresses - fobj = _fix_first_line(f_keep) - mesg = parser().parse(fobj) - - except (UnicodeDecodeError, UnicodeError): - continue - else: - break - finally: - try: - fobj.close() - except NameError: - pass - raise Exception("Encoding not detected. Please pass encoding value manually") + if stdin_file is not None: + fobj = io.BytesIO(stdin_file) + else: + fobj = io.open(fname, mode='rb') + f_keep = fobj + mesg = BytesParser(policy=policy.default.clone(utf8=True)).parse(fobj) + if 'From' not in mesg.keys() and 'Date' not in mesg.keys(): + # If it's not an email message, don't let the email parser + # delete the first line. If it is, let the parser do its job so + # we don't get mailto: links for all the To and From addresses + fobj = _fix_first_line(f_keep) + mesg = BytesParser(policy=policy.default.clone(utf8=True)).parse(fobj) + try: + fobj.close() + except NameError: + pass close_stdin() - # Handle multiple nested message parts - _msg_set_charset(mesg, enc) return mesg @@ -151,37 +143,15 @@ the URLs on that line will not be parsed by email.Parser. Add a blank line at the top of the file to ensure everything is read in a non-email file. - 1. Take the file object 'f'. - 2. Create a new StringIO object that starts with a blank line and read the - file into that. Return as open StringIO object 'f' - 3. Return 'f' - """ fline.seek(0) - new = io.StringIO() - new.write("\n{}".format(fline.read())) + new = io.BytesIO() + new.write(b"\n" + fline.read()) fline.close() new.seek(0) return new -def _msg_set_charset(mesg, encoding): - """Recursive function to set the charset of nested message parts. - - """ - encoding = mesg.get_content_charset() or encoding - try: - mesg.set_charset(encoding) - except (AttributeError, TypeError): - for part in mesg.get_payload(): - try: - # Try once to set correct encoding on the message part, then - # continue without crashing if it fails - _msg_set_charset(part, encoding) - except UnicodeEncodeError: - continue - - def main(): """Entrypoint function for urlscan @@ -192,18 +162,19 @@ return msg = process_input(args.message) if args.nobrowser is False: - tui = urlchoose.URLChooser(urlscan.msgurls(msg), + tui = urlchoose.URLChooser(urlscan.msgurls(msg, regex=args.regex, headers=args.headers), compact=args.compact, reverse=args.reverse, nohelp=args.nohelp, dedupe=args.dedupe, run=args.run, + runsafe=args.runsafe, single=args.single, width=args.width, pipe=args.pipe) tui.main() else: - out = urlchoose.URLChooser(urlscan.msgurls(msg), + out = urlchoose.URLChooser(urlscan.msgurls(msg, regex=args.regex, headers=args.headers), dedupe=args.dedupe, reverse=args.reverse, shorten=False) diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/urlscan-0.9.5/setup.py new/urlscan-0.9.6/setup.py --- old/urlscan-0.9.5/setup.py 2020-07-09 19:25:47.000000000 +0200 +++ new/urlscan-0.9.6/setup.py 2021-03-23 06:00:17.000000000 +0100 @@ -3,17 +3,30 @@ from setuptools import setup setup(name="urlscan", - version="0.9.5", + version="0.9.6", description="View/select the URLs in an email message or file", author="Scott Hansen", author_email="[email protected]", url="https://github.com/firecat53/urlscan", - download_url="https://github.com/firecat53/urlscan/archive/0.9.5.zip", + download_url="https://github.com/firecat53/urlscan/archive/0.9.6.zip", packages=['urlscan'], scripts=['bin/urlscan'], package_data={'urlscan': ['assets/*']}, data_files=[('share/doc/urlscan', ['README.rst', 'COPYING']), ('share/man/man1', ['urlscan.1'])], license="GPLv2", - install_requires=["urwid>=1.2.1"] + install_requires=["urwid>=1.2.1"], + classifiers=[ + 'Development Status :: 4 - Beta', + 'Environment :: Console', + 'Environment :: Console :: Curses', + 'License :: OSI Approved :: GNU General Public License v2 (GPLv2)', + 'Operating System :: OS Independent', + 'Programming Language :: Python', + 'Programming Language :: Python :: 3.6', + 'Programming Language :: Python :: 3.7', + 'Programming Language :: Python :: 3.8', + 'Programming Language :: Python :: 3.9', + 'Topic :: Utilities'], + keywords=("urlscan urlview email mutt tmux"), ) diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/urlscan-0.9.5/urlscan/urlchoose.py new/urlscan-0.9.6/urlscan/urlchoose.py --- old/urlscan-0.9.5/urlscan/urlchoose.py 2020-07-09 19:25:47.000000000 +0200 +++ new/urlscan-0.9.6/urlscan/urlchoose.py 2021-03-23 06:00:17.000000000 +0100 @@ -1,5 +1,5 @@ # Copyright (C) 2006-2007 Daniel Burrows -# Copyright (C) 2020 Scott Hansen +# Copyright (C) 2021 Scott Hansen # # This program is free software; you can redistribute it and/or # modify it under the terms of the GNU General Public License @@ -92,7 +92,8 @@ class URLChooser: def __init__(self, extractedurls, compact=False, reverse=False, nohelp=False, dedupe=False, - shorten=True, run="", single=False, pipe=False, genconf=False, width=0): + shorten=True, run="", runsafe="", single=False, pipe=False, + genconf=False, width=0): self.conf = expanduser("~/.config/urlscan/config.json") self.keys = {'/': self._search_key, '0': self._digits, @@ -178,6 +179,7 @@ self.shorten = shorten self.compact = compact self.run = run + self.runsafe = runsafe self.single = single self.pipe = pipe self.search = False @@ -208,7 +210,9 @@ "/ - search :: " "URL opening mode - {}") self.link_open_modes = ["Web Browser", "Xdg-Open"] if self.xdg is True else ["Web Browser"] - if self.run: + if self.runsafe: + self.link_open_modes.insert(0, self.runsafe) + elif self.run: self.link_open_modes.insert(0, self.run) self.nohelp = nohelp if nohelp is False: @@ -323,8 +327,8 @@ def _open_url(self): """<Enter> or <space>""" - load_text = "Loading URL..." if self.link_open_modes[0] != self.run \ - else "Executing: {}".format(self.run) + load_text = "Loading URL..." if self.link_open_modes[0] != (self.run or self.runsafe) \ + else "Executing: {}".format(self.run or self.runsafe) if os.environ.get('BROWSER') not in ['elinks', 'links', 'w3m', 'lynx']: self._footer_display(load_text, 5) @@ -462,7 +466,7 @@ def _reverse(self): """ R """ # Reverse items - fpo = self.top.body.focus_position + fpo = self.top.base_widget.body.focus_position if self.compact is True: self.items.reverse() else: @@ -475,8 +479,8 @@ else: rev.insert(2, item) self.items = rev - self.top.body = urwid.ListBox(self.items) - self.top.body.focus_position = self._cur_focus(fpo) + self.top.base_widget.body = urwid.ListBox(self.items) + self.top.base_widget.body.focus_position = self._cur_focus(fpo) def _context(self): """ c """ @@ -505,7 +509,7 @@ cmds = COPY_COMMANDS_PRIMARY if pri else COPY_COMMANDS for cmd in cmds: try: - proc = Popen(shlex.split(cmd), stdin=PIPE) + proc = Popen(shlex.split(cmd), stdin=PIPE, stdout=DEVNULL, stderr=DEVNULL) proc.communicate(input=url.encode(sys.getdefaultencoding())) self._footer_display("Copied url to {} selection".format( "primary" if pri is True else "clipboard"), 5) @@ -635,7 +639,7 @@ def _link_handler(self): """Function to cycle through opening links via webbrowser module, - xdg-open or custom expression passed with --run. + xdg-open or custom expression passed with --run-safe or --run. """ mode = self.link_open_modes.pop() @@ -659,10 +663,17 @@ self.search = False self.enter = False elif self.link_open_modes[0] == "Web Browser": - webbrowser.open(url) + webbrowser.open(url.replace('&', '\&')) elif self.link_open_modes[0] == "Xdg-Open": run = 'xdg-open "{}"'.format(url) process = Popen(shlex.split(run), stdout=PIPE, stdin=PIPE) + elif self.link_open_modes[0] == self.runsafe: + if self.pipe: + process = Popen(shlex.split(self.runsafe), stdout=PIPE, stdin=PIPE) + process.communicate(input=url.encode(sys.getdefaultencoding())) + else: + cmd = [i.format(url) for i in shlex.split(self.runsafe)] + Popen(cmd).communicate() elif self.link_open_modes[0] == self.run and self.pipe: process = Popen(shlex.split(self.run), stdout=PIPE, stdin=PIPE) process.communicate(input=url.encode(sys.getdefaultencoding())) diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/urlscan-0.9.5/urlscan/urlscan.py new/urlscan-0.9.6/urlscan/urlscan.py --- old/urlscan-0.9.5/urlscan/urlscan.py 2020-07-09 19:25:47.000000000 +0200 +++ new/urlscan-0.9.6/urlscan/urlscan.py 2021-03-23 06:00:17.000000000 +0100 @@ -1,6 +1,6 @@ # -*- coding: utf-8 -*- # Copyright (C) 2006-2007 Daniel Burrows -# Copyright (C) 2020 Scott Hansen +# Copyright (C) 2021 Scott Hansen # # This program is free software; you can redistribute it and/or # modify it under the terms of the GNU General Public License @@ -18,18 +18,10 @@ """Contains the backend logic that scans messages for URLs and context.""" +from html.parser import HTMLParser +import locale import os import re -from html.parser import HTMLParser - - -def get_charset(message, default="utf-8"): - """Get the message charset""" - if message.get_content_charset(): - return message.get_content_charset() - if message.get_charset(): - return message.get_charset() - return default class Chunk: @@ -255,7 +247,7 @@ URLINTERNALPATTERN = r'[{}()@\w/\\\-%?!&.=:;+,#~]' -URLTRAILINGPATTERN = r'[{}(@\w/\-%&=+#]' +URLTRAILINGPATTERN = r'[{}(@\w/\-%&=+#$]' HTTPURLPATTERN = (r'(?:(https?|file|ftps?)://' + URLINTERNALPATTERN + r'*' + URLTRAILINGPATTERN + r')') # Used to guess that blah.blah.blah.TLD is a URL. @@ -302,7 +294,7 @@ assert not URLRE.match('blah.baz.obviouslynotarealdomain') -def parse_text_urls(mesg): +def parse_text_urls(mesg, regex=None): """Parse a block of text, splitting it into its url and non-url components.""" @@ -310,16 +302,24 @@ loc = 0 + global URLRE + + if regex: + URLRE = re.compile(regex) + for match in URLRE.finditer(mesg): if loc < match.start(): rval.append(Chunk(mesg[loc:match.start()], None)) # Turn email addresses into mailto: links - email = match.group("email") - if email and "mailto" not in email: - mailto = "mailto:{}".format(email) + if regex: + rval.append(Chunk(None, match.group(0))) else: - mailto = match.group(1) - rval.append(Chunk(None, mailto)) + email = match.group("email") + if email and "mailto" not in email: + mailto = "mailto:{}".format(email) + else: + mailto = match.group(1) + rval.append(Chunk(None, mailto)) loc = match.end() if loc < len(mesg): @@ -393,7 +393,7 @@ NLRE = re.compile('\r\n|\n|\r') -def extracturls(mesg): +def extracturls(mesg, regex=None): """Given a text message, extract all the URLs found in the message, along with their surrounding context. The output is a list of sequences of Chunk objects, corresponding to the contextual regions extracted from the string. @@ -412,7 +412,7 @@ # lines with more than one entry or one entry that's # a URL are the only lines containing URLs. - linechunks = [parse_text_urls(l) for l in lines] + linechunks = [parse_text_urls(l, regex=regex) for l in lines] return extract_with_context(linechunks, lambda chunk: len(chunk) > 1 or @@ -439,41 +439,68 @@ return extract_with_context(chunk.rval, somechunkisurl, 1, 1) -def decode_bytes(byt, enc='utf-8'): - """Given a string or bytes input, return a string. +def msgheaders(msg): + """ Process email message headers for URLs - Args: bytes - bytes or string - enc - encoding to use for decoding the byte string. + Args: msg - email message object + Returns: list """ - try: - strg = byt.decode(enc) - except UnicodeDecodeError as err: - strg = "Unable to decode message:\n{}\n{}".format(str(byt), err) - except (AttributeError, UnicodeEncodeError): - # If byt is already a string, just return it - return byt - return strg + headers = ('Archived-At', + 'Link', + 'List-Archive', + 'List-ID', + 'List-Help', + 'List-Owner', + 'List-Post', + 'List-Subscribe', + 'List-Unsubscribe', + 'List-Unsubscribe-Post') + res = [] + for hdr in headers: + hdri = msg.get(hdr) + if hdri: + res.append(hdri) + return res + + +def set_charset(message): + """Get and/or set the message or message part charset. Try the + content-charset or charset if it exists, or attempt to decode the message + with a variety of charsets to find the correct one. + Args: message - EmailMessage object + Returns: message - EmailMessage object -def decode_msg(msg, enc='utf-8'): """ - Decodes a message fragment. - - Args: msg - A Message object representing the fragment - enc - The encoding to use for decoding the message - """ - # We avoid the get_payload decoding machinery for raw - # content-transfer-encodings potentially containing non-ascii characters, - # such as 8bit or binary, as these are encoded using raw-unicode-escape which - # seems to prevent subsequent utf-8 decoding. - cte = str(msg.get('content-transfer-encoding', '')).lower() - decode = cte not in ("8bit", "7bit", "binary") - res = msg.get_payload(decode=decode) - return decode_bytes(res, enc) + if message.get_content_charset(): + return message + if message.get_charset(): + return message + enc_list = ['UTF-8', 'LATIN-1', 'iso8859-1', 'iso8859-2', + 'UTF-16', 'CP1252', 'CP720', 'CP437'] + locale.setlocale(locale.LC_ALL, '') + code = locale.getpreferredencoding() + if code not in enc_list: + enc_list.insert(0, code) + for enc in enc_list: + try: + message.as_bytes().decode(enc) + except (UnicodeDecodeError, UnicodeError): + continue + else: + try: + message.set_param('charset', enc) + except (KeyError, UnicodeEncodeError): + # Try once to set correct encoding on the message part, then + # continue without crashing if it fails + continue + break + raise Exception("Encoding not detected.") + return message -def msgurls(msg, urlidx=1): +def msgurls(msg, urlidx=1, regex=None, headers=False): """Main entry function for urlscan.py """ @@ -481,19 +508,22 @@ # one subpart in the future (e.g., for # multipart/alternative). Actually, I might even add # a browser for the message structure? - enc = get_charset(msg) + if headers is True: + for part in msgheaders(set_charset(msg)): + for chunk in extracturls(part): + urlidx += 1 + yield chunk + msg = set_charset(msg) if msg.is_multipart(): - for part in msg.get_payload(): - for chunk in msgurls(part, urlidx): + for part in msg.iter_parts(): + for chunk in msgurls(set_charset(part), urlidx, regex=regex): urlidx += 1 yield chunk elif msg.get_content_type() == "text/plain": - decoded = decode_msg(msg, enc) - for chunk in extracturls(decoded): + for chunk in extracturls(msg.get_content(), regex=regex): urlidx += 1 yield chunk elif msg.get_content_type() == "text/html": - decoded = decode_msg(msg, enc) - for chunk in extracthtmlurls(decoded): + for chunk in extracthtmlurls(msg.get_content()): urlidx += 1 yield chunk diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/urlscan-0.9.5/urlscan.1 new/urlscan-0.9.6/urlscan.1 --- old/urlscan-0.9.5/urlscan.1 2020-07-09 19:25:47.000000000 +0200 +++ new/urlscan-0.9.6/urlscan.1 2021-03-23 06:00:17.000000000 +0100 @@ -1,6 +1,6 @@ .\" Hey, EMACS: -*- nroff -*- -.TH URLSCAN 1 "15 May 2020" +.TH URLSCAN 1 "6 March 2021" .SH NAME urlscan \- browse the URLs in an email message from a terminal @@ -14,8 +14,8 @@ .SH DESCRIPTION \fBurlscan\fR accepts a single email message on standard input, then displays a terminal-based list of the URLs in the given -message. Selecting a URL uses the Python webbrowser module to -determine which browser to open. The \fBBROWSER\fR environment +message. Selecting a URL uses the Python webbrowser module to +determine which browser to open. The \fBBROWSER\fR environment variable will be used if it is set. \fBurlscan\fR is primarily intended to be used with the @@ -53,15 +53,19 @@ \fB7.\fR \fBu\fR will unescape the highlighted URL if necessary. \fB8.\fR Run a command with the selected URL as the argument or pipe the -selected URL to a command using the \fB--run\fR and \fB--pipe\fR arguments. +selected URL to a command using the \fB--run-safe\fR, \fB--run\fR and +\fB--pipe\fR arguments. \fB9.\fR Use \fBl\fR to cycle through whether URLs are opened using the Python webbrowser module (default), xdg-open (if installed) or a function passed on the -command line with \fB--run\fR. The \fB--run\fR function will respect the value -of \fB--pipe\fR. +command line with \fB--run-safe\fR or \fB--run\fR. The \fB--run\fR and +\fB--run-safe\fR functions will respect the value of \fB--pipe\fR. \fB10.\fR \fBF1\fR shows the help menu. +\fB11.\fR Scan certain email headers for URLs. Currently \fBLink\fR, +\fBArchived-At\fR and \fBList-*\fR are scanned when \fB--headers\fR is passed. + .SH OPTIONS .TP .B \-g, \-\-genconf @@ -81,19 +85,25 @@ Disables the selection interface and print the links to standard output. Useful for scripting (implies \fB\-\-compact\fR). .TP -.B \-r, \-\-run \<expression\> +.B \-f, \-\-run\-safe \<expression\> Execute \<expression\> in place of opening URL with a browser. Use {} in \<expression\> to substitute in the URL. Examples: + $ urlscan --run-safe 'tmux set buffer {}' +.TP +.B \-r, \-\-run \<expression\> +Execute \<expression\> in place of opening URL with a browser. Use {} in +\<expression\> to substitute in the URL. Shell features such as \| and \> can be +used, but it is less secure. Examples: + $ urlscan --run 'echo {} | xclip -i' file.txt - $ urlscan --run 'tmux set buffer {}' .TP .B \-p, \-\-pipe -Pipe the selected URL to the command specified by `--run`. This is preferred -when the command supports it, as it is more secure and tolerant of special -characters in the URL. Example: +Pipe the selected URL to the command specified by `--run-safe` or `--run`. This +is preferred when the command supports it, as it is more secure and tolerant of +special characters in the URL. Example: - $ urlscan --run 'xclip -i' --pipe file.txt + $ urlscan --run-safe 'xclip -i' --pipe file.txt .TP .B \-R, \-\-reverse Reverse displayed order of URLs. @@ -105,6 +115,18 @@ .TP .B \-w, \-\-width Set display width. +.TP +.B \-E, \-\-regex \<expression\> +Use \<expression\> in place of the default set of regular expressions, +to be used for any kind of matching. This is useful for example when +selectively avoiding 'mailto:' links or any other pattern that urlscan +could interpret as urls (such as '<filename>.<extension>'). Usage +example: + + $ urlscan --regex 'https?://.+\.\w+' file.txt +.TP +.B \-\-headers +Scan email headers for URLs. .SH MUTT INTEGRATION
