On 30/04/15 14:04, User Goblin wrote:
My initial idea was to parse wget's -o output and figure out which files
still need to be downloaded, and then feed them via -i when continuing the
download. This led me to the conclusion that I'd need two pieces of
functionality, (1) machine-parseable output of -o, and (2) a way to convert
a partially downloaded directory structure to links that still need
downloading.

I could work around (1), the output of -o is just hard to parse.

For (2), I could use lynx or w3m or something like that, but then I never
am sure that the links produced are the same that wget produced. Therefore
I'd love an option like `wget --extract-links ./index.html` that'd just
read an html file and produce a list of links on output. Or perhaps an
assertion that some other tool like urlscan will do it exactly the same way
as wget.
I made such program some time ago, but was never merged into wget. See
“Exposing wget functionality for extracting links from a web page”
https://lists.gnu.org/archive/html/bug-wget/2013-09/msg00079.html

0001-Moved-free_urlpos.patch no longer applies cleanly, so I'm attaching a
rebased one (it's a trivial change, though).

>From 1335548e721486cd77717c6cb938f9927e63f0fc Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=C3=81ngel=20Gonz=C3=A1lez?= <[email protected]>
Date: Mon, 4 May 2015 22:37:12 +0200
Subject: [PATCH] Move free_urlpos()

---
 src/html-url.c | 15 +++++++++++++++
 src/retr.c     | 15 ---------------
 2 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/src/html-url.c b/src/html-url.c
index 0743587..143fecc 100644
--- a/src/html-url.c
+++ b/src/html-url.c
@@ -870,3 +870,18 @@ cleanup_html_url (void)
   if (interesting_attributes)
     hash_table_destroy (interesting_attributes);
 }
+
+/* Free the linked list of urlpos.  */
+void
+free_urlpos (struct urlpos *l)
+{
+  while (l)
+    {
+      struct urlpos *next = l->next;
+      if (l->url)
+        url_free (l->url);
+      xfree_null (l->local_name);
+      xfree (l);
+      l = next;
+    }
+}
diff --git a/src/retr.c b/src/retr.c
index f60da6e..0bca092 100644
--- a/src/retr.c
+++ b/src/retr.c
@@ -1180,21 +1180,6 @@ sleep_between_retrievals (int count)
     }
 }
 
-/* Free the linked list of urlpos.  */
-void
-free_urlpos (struct urlpos *l)
-{
-  while (l)
-    {
-      struct urlpos *next = l->next;
-      if (l->url)
-        url_free (l->url);
-      xfree (l->local_name);
-      xfree (l);
-      l = next;
-    }
-}
-
 /* Rotate FNAME opt.backups times */
 void
 rotate_backups(const char *fname)
-- 
2.3.7

Reply via email to