Re: Character encoding

2005-04-06 Thread Alain Bench
Hello Georg,

 On Friday, April 1, 2005 at 12:01:15 PM +0200, Georg Bauhaus wrote:

 The apostrophy might have been typed as an accent (acute) really

Most probably the RIGHT SINGLE QUOTATION MARK U+2019, , encoded
in UTF-8, then wrongly seen as being CP-1252. It would look like 
(a circumflex, euro symbol, trademark sign), and once transliterated to
Latin-1 like EUR(tm).


Bye!Alain.
-- 
When you want to reply to a mailing list, please avoid doing so from a
digest. This often builds incorrect references and breaks threads.


Re: File rejection is not working

2005-04-06 Thread Jens Rösner
Hi Jerry!

AFAIK, RegExp for (HTML?) file rejection was requested a few times, but is
not implemented at the moment.

CU
Jens (just another user)

 The -R option is not working in wget 1.9.1 for anything but
 specifically-hardcoded filenames..
  
 file[Nn]ames such as [Tt]hese are simply ignored...
  
 Please respond... Do not delete my email address as I am not a
 subscriber... Yet
  
 Thanks
  
 Jerry
 

-- 
Sparen beginnt mit GMX DSL: http://www.gmx.net/de/go/dsl


converting links

2005-04-06 Thread andi kete
My question, using DOS wget, is:
If wget is stopped before finished, links are not converted to relative 
links (which point to hard disk file:/);

as I couldn't find anything about that relating to wget on the web 
(including not on your mailing list) and any wget option(s) I tried failed 
to achieve this,

I would like to know whether you provide any utility program for that
(in my situation, using public gratis computers with limited access it is 
almost impossible without consequences to try to recompile one that part of 
your source program if possible without major diffulties at all, it is also 
not possible to move already downloaded files to local webserver and 
download them from there, and repeating download with one level less would 
lose some files already on the disk) ?

As I am not subscribed, I'd like to be cc'd in replies to your post.
Andi Kete
_
Don't just search. Find. Check out the new MSN Search! 
http://search.msn.com/



RE: File rejection is not working

2005-04-06 Thread Tony Lewis
Jens Rösner wrote: 

 AFAIK, RegExp for (HTML?) file rejection was requested a few times, but is
 not implemented at the moment.

It seems all the examples people are sending are just attempting to get a
match that is not case sensitive. A switch to ignore case in the file name
match would be a lot easier to implement than regular expressions and solve
the most pressing need.

Just a thought.

Tony




Re: File rejection is not working

2005-04-06 Thread Hrvoje Niksic
Jens Rösner [EMAIL PROTECTED] writes:

 AFAIK, RegExp for (HTML?) file rejection was requested a few times,
 but is not implemented at the moment.

But the shell-style globbing (which includes [Nn]ame) should still
work, even without regexps.


Re: converting links

2005-04-06 Thread Doug Kaufman
On Wed, 6 Apr 2005, andi kete wrote:

 My question, using DOS wget, is:
 
 If wget is stopped before finished, links are not converted to relative 
 links (which point to hard disk file:/);
 ... 
 I would like to know whether you provide any utility program for that
 
 (in my situation, using public gratis computers with limited access it is 
 almost impossible without consequences to try to recompile one that part of 
 your source program if possible without major diffulties at all, it is also 
 not possible to move already downloaded files to local webserver and 
 download them from there, and repeating download with one level less would 
 lose some files already on the disk) ?

I am a bit confused by your post. Are you really using the DOS port
obtained from my web site? It seems unlikely that public computers would
be running DOS. Perhaps you are really using a Windows port. What does
wget --version say?

When you say that wget is stopped before finished, are you stopping it
with CTRL-C or is it stopping because of a network problem? What is your
command line? I don't know of a utility that converts all the links to
local, but you should be able to write a script using sed, awk, or perl
to do what you want. To see what wget does, look in the file convert.c
in the wget source.

 Doug

-- 
Doug Kaufman
Internet: [EMAIL PROTECTED]



wget regex patch

2005-04-06 Thread Tobias Tiederle
Hello,

after reading so much about regex support for wget (espacially the lack
of it) and experiencing myself how annoying it can be if you have
downloaded a hundred /thumbs/ directories, I tried to implement regex
support myself.
I used pcre library from http://www.pcre.org which was pretty easy to
use, given the fact that I never ever touched a single line of C (or
C++) code before.
Unfortunately I don't know jack about autoconf, makefiles etc.
The patch in its current form is only useful with MSVC as I didn't alter
any other makefiles.
I hope someone can do that for me and include the pcre license from
http://www.pcre.org/license.txt

As you can see pcre.h and pcre.lib need to be somwhere the compiler can
find them and HAVE_REGEX needs to be defined.
Files and directories are ignored if the regex given on the command line
match. For Syntax see wget --help.
The patch was made against current cvs code.
Hope this helps somehow.

Tobias
diff -ruwb wget-regex2/src/ftp.c wget-regex3/src/ftp.c
--- wget-regex2/src/ftp.c   Sat Apr 02 02:41:04 2005
+++ wget-regex3/src/ftp.c   Wed Apr 06 18:55:24 2005
@@ -1749,7 +1749,11 @@
 return res;
   /* First: weed out that do not conform the global rules given in
  opt.accepts and opt.rejects.  */
+#ifdef HAVE_REGEX 
+  if (opt.accepts || opt.rejects || opt.exclregfile)
+#else
   if (opt.accepts || opt.rejects)
+#endif /* HAVE_REGEX */
 {
   f = start;
   while (f)
diff -ruwb wget-regex2/src/init.c wget-regex3/src/init.c
--- wget-regex2/src/init.c  Sun Mar 20 17:07:38 2005
+++ wget-regex3/src/init.c  Wed Apr 06 19:37:13 2005
@@ -137,6 +137,10 @@
 #endif
   { excludedirectories, opt.excludes,   cmd_directory_vector },
   { excludedomains,  opt.exclude_domains,   cmd_vector },
+#ifdef HAVE_REGEX  
+  { excluderegexdir, opt.exclregdir,cmd_string },
+  { excluderegexfile, opt.exclregfile,  cmd_string },
+#endif /* HAVE_REGEX */
   { followftp,   opt.follow_ftp,cmd_boolean },
   { followtags,  opt.follow_tags,   cmd_vector },
   { forcehtml,   opt.force_html,cmd_boolean },
@@ -1367,6 +1371,12 @@
   xfree_null (opt.sslcertkey);
   xfree_null (opt.sslcertfile);
 #endif /* HAVE_SSL */
+#ifdef HAVE_REGEX
+  xfree_null (opt.exclregdir_c)
+  xfree_null (opt.exclregfile_c)
+  xfree_null (opt.exclregdir);
+  xfree_null (opt.exclregfile);
+#endif /* HAVE_REGEX */
   xfree_null (opt.bind_address);
   xfree_null (opt.cookies_input);
   xfree_null (opt.cookies_output);
diff -ruwb wget-regex2/src/main.c wget-regex3/src/main.c
--- wget-regex2/src/main.c  Tue Mar 22 15:20:02 2005
+++ wget-regex3/src/main.c  Wed Apr 06 19:03:56 2005
@@ -68,6 +68,10 @@
 /* On GNU system this will include system-wide getopt.h. */
 #include getopt.h
 
+#ifdef HAVE_REGEX
+#include pcre.h
+#endif /* HAVE_REGEX */
+
 #ifndef PATH_SEPARATOR
 # define PATH_SEPARATOR '/'
 #endif
@@ -176,6 +180,10 @@
 { egd-file, 0, OPT_VALUE, egdfile, -1 },
 { exclude-directories, 'X', OPT_VALUE, excludedirectories, -1 },
 { exclude-domains, 0, OPT_VALUE, excludedomains, -1 },
+#ifdef HAVE_REGEX
+{ exclude-regex-dirs, 0, OPT_VALUE, excluderegexdir, -1 },
+{ exclude-regex-files, 0, OPT_VALUE, excluderegexfile, -1 },
+#endif
 { execute, 'e', OPT__EXECUTE, NULL, required_argument },
 { follow-ftp, 0, OPT_BOOLEAN, followftp, -1 },
 { follow-tags, 0, OPT_VALUE, followtags, -1 },
@@ -591,6 +599,12 @@
   -D,  --domains=LIST  comma-separated list of accepted 
domains.\n),
 N_(\
--exclude-domains=LIST  comma-separated list of rejected 
domains.\n),
+#ifdef HAVE_REGEX  
+N_(\
+   --exclude-regex-dirs=PATTERN   pattern of directories to reject.\n),
+   N_(\
+   --exclude-regex-files=PATTERN  pattern of files to reject.\n),
+#endif /* HAVE_REGEX */
 N_(\
--follow-ftpfollow FTP links from HTML documents.\n),
 N_(\
@@ -647,6 +661,7 @@
   int i, ret, longindex;
   int nurl, status;
   int append_to_log = 0;
+  const char *error;  
 
   i18n_initialize ();
 
@@ -819,6 +834,40 @@
   exit (1);
 }
 #endif
+
+#ifdef HAVE_REGEX
+  if (opt.exclregdir)
+{  
+  opt.exclregdir_c = pcre_compile(
+opt.exclregdir,   /* the pattern */
+0,/* default options */
+error,   /* for error message */
+i,   /* for error offset */
+NULL);/* use default character tables */   
+  
+  if (opt.exclregdir_c == NULL)
+{  
+  printf (_(Directory RegEx compilation failed at offset %d: %s\n), 
i, error);
+  exit (1);
+}
+}
+
+if (opt.exclregfile)
+{  
+  opt.exclregfile_c = pcre_compile(
+opt.exclregfile,   /* the pattern */
+0,/* default options */
+error,   /* for error message */
+i,