Ihor Radchenko <[email protected]> writes: >> There are a couple of known limitations: >> - Some github gists automatically redirect appropriately (f.e., >> https://gist.github.com/4343164 redirects to >> https://gist.github.com/tonyday567/4343164). gothub only supports the >> "complete" url (/user/gistid). Those gist urls aren't supported yet. >> - github supports two way of forcing the "raw" file, via >> raw.github.com/... and via github.com/...?raw=true, I currently only >> convert the first one. >> - github issues pages are not rewritten, see above. > > I am wondering whether such unconvertable links might be captured when > publishing with --debug flag.
I am not sure what you mean with captured, do you mean "following" the redirection? >> publish.sh | 89 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> 1 file changed, 89 insertions(+) > > This is complex enough to warrant a separate file I think. Agreed, fixed. > defvar would be more canonical. Fixed, thanks. What I've done here: - I've fixed GET error handling (we simply do not redirect any url if we can't get libredirect data). - When the 'libredirect data' (what should we redirect to what) changes between publishing, we now export all org files rather than only updated ones, described by Christian and Ihor. - We redirect reddit.com to old.reddit.com What I have *not* yet done: - I have ignored the confusing link descriptions for now - when a link refers to github.com, that link still points to a free alternative (currently gothub.lunar.icu). A simple regex for the domain would suffice I think. - I still haven't touched org-web-social.org (the aggregate page), or looked at URLs that are broken regardless of this patch. As before, you can see the results on https://files.rensoliemans.nl/worg/, though only the reddit links differ from before. Also, is it customary to change the subject line to one containing PATCH halfway through the discussion? Best, Rens
>From f6aad7cef8b6dc5f5b3d6930c549e33e229923bf Mon Sep 17 00:00:00 2001 From: Rens Oliemans <[email protected]> Date: Tue, 25 Nov 2025 11:10:46 +0100 Subject: [PATCH] Find alternative links for YouTube, GitHub and Reddit This commit finds alternative links for the hosts mentioned above since they rely on non-free JS. See https://list.orgmode.org/orgmode/87pl9szmy6.fsf@localhost/ for context. --- .gitignore | 1 + publish.sh | 60 +++++++++-------- rewrite-urls.el | 171 ++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 205 insertions(+), 27 deletions(-) create mode 100644 rewrite-urls.el diff --git a/.gitignore b/.gitignore index c9dc5b62..399e35e9 100644 --- a/.gitignore +++ b/.gitignore @@ -4,3 +4,4 @@ patches/ /.project *patch +*.sexp diff --git a/publish.sh b/publish.sh index b499de3d..6b163187 100755 --- a/publish.sh +++ b/publish.sh @@ -14,6 +14,9 @@ This variable can be set when running publish.sh script: (require 'htmlize) (require 'org-inlinetask) +(push "./" load-path) +(require 'rewrite-urls) + (setq make-backup-files nil debug-on-error t) @@ -58,30 +61,33 @@ This variable can be set when running publish.sh script: (R . t) (gnuplot . t))) -(dolist (org-file (cl-remove-if - (lambda (n) (string-match-p "worg/archive/" n)) - (directory-files-recursively default-directory "\\.org$"))) - (let ((html-file (concat (file-name-directory org-file) - (file-name-base org-file) ".html"))) - (if (and (file-exists-p html-file) - (file-newer-than-file-p html-file org-file) - ;; If there are include files or code, we need to - ;; re-generate the HTML just in case if the included - ;; files are changed. - (with-temp-buffer - (insert-file-contents org-file) - (and - (save-excursion - (goto-char (point-min)) - (not (re-search-forward "#\\+include:" nil t))) - (save-excursion - (goto-char (point-min)) - (not (re-search-forward "#\\+begin_src" nil t)))))) - (message " [skipping] unchanged %s" org-file) - (message "[exporting] %s" (file-relative-name org-file default-directory)) - (with-current-buffer (find-file org-file) - (if worg-publish-stop-on-error - (org-html-export-to-html) - (condition-case err - (org-html-export-to-html) - (error (message (error-message-string err))))))))) +(add-hook 'org-export-before-parsing-functions #'add-alternative-links) +(let ((export-all (libredirect-data-changed-p libredirect-data))) + (dolist (org-file (cl-remove-if + (lambda (n) (string-match-p "worg/archive/" n)) + (directory-files-recursively default-directory "\\.org$"))) + (let ((html-file (concat (file-name-directory org-file) + (file-name-base org-file) ".html"))) + (if (and (not export-all) + (file-exists-p html-file) + (file-newer-than-file-p html-file org-file) + ;; If there are include files or code, we need to + ;; re-generate the HTML just in case if the included + ;; files are changed. + (with-temp-buffer + (insert-file-contents org-file) + (and + (save-excursion + (goto-char (point-min)) + (not (re-search-forward "#\\+include:" nil t))) + (save-excursion + (goto-char (point-min)) + (not (re-search-forward "#\\+begin_src" nil t)))))) + (message " [skipping] unchanged %s" org-file) + (message "[exporting] %s" (file-relative-name org-file default-directory)) + (with-current-buffer (find-file org-file) + (if worg-publish-stop-on-error + (org-html-export-to-html) + (condition-case err + (org-html-export-to-html) + (error (message (error-message-string err)))))))))) diff --git a/rewrite-urls.el b/rewrite-urls.el new file mode 100644 index 00000000..6a833324 --- /dev/null +++ b/rewrite-urls.el @@ -0,0 +1,171 @@ +(defvar libredirect-url "https://raw.githubusercontent.com/libredirect/instances/main/data.json") +(defvar libredirect-data nil) + +(defcustom free-alternatives '((youtube . invidious) + (github . gothub)) + "Alist of free frontends to use. The CDR corresponds to a symbol that is +known by libredirect, see https://codeberg.org/LibRedirect/instances.") +(defcustom redirect-save-filename "redirect-local-data.sexp" + "Filename used to save the relevant output of libredirect's json") + + +(defun libredirect-data-changed-p (&optional data) + "Whether or not libredirect data changed since our last export." + (setq libredirect-data (or data (-get-libredirect-data))) + (if (not (file-exists-p redirect-save-filename)) + ;; We hadn't saved stuff from our previous export, do so now. + (progn + (-save-redirect-data libredirect-data) + t) + ;; Compare results from stored file with data, overwrite stored data if newer + (let* ((stored-data (with-temp-buffer + (insert-file-contents redirect-save-filename) + (read (buffer-string)))) + (data-equal (equal (-relevant-redirect-data libredirect-data) + stored-data))) + (unless data-equal + (-save-redirect-data libredirect-data)) + (not data-equal)))) + + +(defun -get-libredirect-data () + "Returns libredirect data from github" + (condition-case _ + (let ((response (with-current-buffer + (url-retrieve-synchronously libredirect-url nil nil 5) + (prog2 + (re-search-forward "\n\n" nil t) ; skip HTTP headers + (buffer-substring-no-properties (point) (point-max)) + (kill-buffer))))) + (json-parse-string response :object-type 'alist :array-type 'list)) + (t + (message "Worg couldn't access libredirect data, using original URLs.") + nil))) + + +(defun -save-redirect-data (&optional data) + "Saves redirect data in a file with filename `redirect-save-filename'." + (let ((tosave (or data libredirect-data))) + (with-temp-file redirect-save-filename + (prin1 + (-relevant-redirect-data data) + (current-buffer))))) + + +(defun -relevant-redirect-data (data) + "Extracts relevant redirection data, defined in `free-alternatives'." + (seq-map (lambda (association) + (let ((from (car association))) + `(,from . ,(-first-link-of-json data from)))) + free-alternatives)) + + +(defun add-alternative-links (&optional _) + "Add alternative links for websites containing non-free JS. +For each link that has an alternative (currently YouTube and GitHub), we +insert a link to the free alternative, and change the link text of the +original link to to =(original URL)=. We also redirect reddit links to +old.reddit.com. + +See https://list.orgmode.org/orgmode/87pl9szmy6.fsf@localhost/ " + (unless libredirect-data + (setq libredirect-data (-get-libredirect-data))) + + (let ((links (org-element-map (org-element-parse-buffer) 'link #'identity))) + (dolist (link (nreverse links)) + (when (string-prefix-p "http" (org-element-property :type link)) + (when-let ((new-url (-find-replacement (org-element-property :path link)))) + (let* ((begin (org-element-property :begin link)) + (contents-begin (org-element-property :contents-begin link)) + (contents-end (org-element-property :contents-end link))) + (save-excursion + (if (and contents-begin contents-end) + ;; Link has description + (let ((description (buffer-substring contents-begin contents-end))) + ;; Change text of original link to (original URL) + (goto-char contents-begin) + (delete-region contents-begin contents-end) + (insert "(original URL)") + ;; Insert new link with free alternative + (goto-char begin) + (insert (format "[[%s][%s]] " new-url description))) + ;; else, "bare" link without description. We insert + ;; the original link as bare, and add the old one + ;; with (original URL) as description. + (let ((end (org-element-property :end link))) + (goto-char end) + (insert (format "][%s]]" "(original URL)")) + (goto-char begin) + (insert (format "[[%s]] [[" new-url))))))))))) + +(defun -find-replacement (path) + "Takes PATH and returns an alternative link if known and nil otherwise." + (and libredirect-data + (pcase path + ;; See https://docs.invidious.io/redirector/ + ;; This matches both + ;; - youtube.com/watch?v=<video-id> + ;; - youtube.com/playlist?list=<playlist-id> + ;; with optionally ? and = escaped + ((rx "//" (? "www\.") + "youtube.com" + (group + (or (and "/watch" + (or "?" "%3F") + "v" + (or "=" "%3D")) + (and "/playlist" + (or "?" "%3F") + "list" + (or "=" "%3D")))) + (group (+ not-newline))) + (let* ((route (match-string 1 path)) + (id (match-string 2 path)) + (host (-first-link-of-json libredirect-data 'youtube))) + (concat host route id))) + ;; youtu.be/<video-id> + ((rx "//" (? "www\.") + "youtu.be" + "/" + (group (+ not-newline))) + (let* ((video-id (match-string 1 path)) + (host (-first-link-of-json libredirect-data 'youtube))) + (concat host "/watch?v=" video-id))) + ;; gothub supports all links currently used in Worg, except for + ;; /<user>/<repo>/<issues>, so we exclude those. + ((and + (rx "//" (? "www\.") + "github.com" + "/" + (group (+ not-newline))) + ;; Exclude issues path. This approach currently also would + ;; exclude a supported link if author or repo starts with + ;; "issues" + (guard (not (string-match-p "/issues" path)))) + (let* ((url (match-string 1 path)) + (host (-first-link-of-json libredirect-data 'github))) + (concat host "/" url))) + ((rx "//" "gist.github.com/" + (group (+ not-newline))) + (let* ((url (match-string 1 path)) + (host (-first-link-of-json libredirect-data 'github))) + (concat host "/gist/" url))) + ;; redirect reddit.com to old.reddit.com + ((rx "//" (? "www\.") + "reddit.com" + "/" + (group (+ not-newline))) + (let* ((url (match-string 1 path)) + (host "https://old.reddit.com")) + (concat host "/" url)))))) + +(defun -first-link-of-json (data host) + "Gets the first available link of the alternative links for HOST. +This is based on the json from +https://codeberg.org/LibRedirect/instances/src/branch/main/data.json." + (let* ((alternative (cdr (assoc host free-alternatives))) + (object (cdr (assoc alternative libredirect-data))) + (links (cdr (assoc 'clearnet object)))) + (car links))) + +(provide 'rewrite-urls) -- 2.50.1
