Re: Post size limit?

2008-09-21 Thread mm w
Hi
what is the server log, I guess a boundary problem your headers are
wrong that's all I pretty sure if you look at the server error logs
you will get your answer, post files are not really post data... you
have to set up your http body correctly

Cheers!

On Sun, Sep 21, 2008 at 1:10 PM, DeVill [EMAIL PROTECTED] wrote:
 Hi!

 I've been trying to send post variables with --post-file option of
 wget. (I have two variables in the file, both urlencoded, one of them
 is quite large.) It worked fine until it came across a file that was
 4.7M in size: post variables just won't get through to the server... I
 tried to do the same post with Mozilla Firefox, and it worked fine,
 but I had the same results with curl :-(

 Any ideas what could be the problem?

 Please cc me, I'm not subscribed!

 Thanks!

 Bye
 DeVill




-- 
-mmw


Re: No downloading

2008-06-29 Thread mm w
the default index is not named index, or there is a HTTP test
server/side regarding HTTP_USER_AGENT

On Sun, Jun 29, 2008 at 1:42 PM, Mishari Almishari [EMAIL PROTECTED] wrote:
 Hi,
 I want to download the website www.2006election.net

 For that, I used the command
 wget -d -nd -p -E -H -k -K -S -R png,gif,jpg,bmp,ico  --ignore-length
 --user-agent=Mozilla -e robots=off -P www.2006election.net -o
 www.2006election.net.out  http://www.2006election.net;

 But the downloaded page index.html has no content (except body/head tags),
 eventhough i can see the content when i used internet exprolorer.

 Any Clue!

 Thanks in advance!

 -mish



-- 
-mmw


Re: Wget 1.11.3 - case sensetivity and URLs

2008-06-19 Thread mm w
a simple url-rewriting conf should fix the problem, wihout touch the file system
everything can be done server side

Best Regards

On Thu, Jun 19, 2008 at 6:29 AM, Coombe, Allan David (DPS)
[EMAIL PROTECTED] wrote:
 Thanks averyone for the contributions.

 Ultimately, our purpose is to process documents from the site into our
 search database, so probably the most important thing is to limit the
 number of files being processed.  The case of  the URLs in the html
 probably wouldn't cause us much concern, but I could see that it might
 be useful to convert a site for mirroring from a non-case sensetive
 (windows) environment to a case sensetive (li|u)nix one - this would
 need to include translation of urls in content as well as filenames on
 disk.

 In the meantime - does anyone know of a proxy server that could
 translate urls from mixed case to lower case.  I thought that if we
 downloaded using wget via such a proxy server we might get the
 appropriate result.

 The other alternative we were thinking of was to post process the files
 with symlinks for all mixed case versions of files and directories (I
 think someone already suggested this - greate minds and all that...). I
 assume that wget would correctly use the symlink to determine the
 time/date stamp of the file for determining if it requires updating (or
 would it use the time/date stamp of the symlink?). I also assume that if
 wget downloaded the file it would overwrite the symlink and we would
 have to run our convert files to symlinks process again.

 Just to put it in perspective, the actual site is approximately 45gb
 (that's what the administrator said) and wget downloaded  100gb
 (463,000 files) when I did the first process.

 Cheers
 Allan

 -Original Message-
 From: Micah Cowan [mailto:[EMAIL PROTECTED]
 Sent: Saturday, 14 June 2008 7:30 AM
 To: Tony Lewis
 Cc: Coombe, Allan David (DPS); 'Wget'
 Subject: Re: Wget 1.11.3 - case sensetivity and URLs


 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Tony Lewis wrote:
 Micah Cowan wrote:

 Unfortunately, nothing really comes to mind. If you'd like, you could

 file a feature request at
 https://savannah.gnu.org/bugs/?func=additemgroup=wget, for an option

 asking Wget to treat URLs case-insensitively.

 To have the effect that Allan seeks, I think the option would have to
 convert all URIs to lower case at an appropriate point in the process.

 I think you probably want to send the original case to the server
 (just in case it really does matter to the server). If you're going to

 treat different case URIs as matching then the lower-case version will

 have to be stored in the hash. The most important part (from the
 perspective that Allan voices) is that the versions written to disk
 use lower case characters.

 Well, that really depends. If it's doing a straight recursive download,
 without preexisting local files, then all that's really necessary is to
 do lookups/stores in the blacklist in a case-normalized manner.

 If preexisting files matter, then yes, your solution would fix it.
 Another solution would be to scan directory contents for the first name
 that matches case insensitively. That's obviously much less efficient,
 but has the advantage that the file will match at least one of the
 real cases from the server.

 As Matthias points out, your lower-case normalization solution could be
 achieved in a more general manner with a hook. Which is something I was
 planning on introducing perhaps in 1.13 anyway (so you could, say, run
 sed on the filenames before Wget uses them), so that's probably the
 approach I'd take. But probably not before 1.13, even if someone
 provides a patch for it in time for 1.12 (too many other things to focus
 on, and I'd like to introduce the external command hooks as a suite,
 if possible).

 OTOH, case normalization in the blacklists would still be useful, in
 addition to that mechanism. Could make another good addition for 1.13
 (because it'll be more useful in combination with the rename hooks).

 - --
 Micah J. Cowan
 Programmer, musician, typesetting enthusiast, gamer,
 and GNU Wget Project Maintainer.
 http://micah.cowan.name/
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.6 (GNU/Linux)
 Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

 iD8DBQFIUua+7M8hyUobTrERAr0tAJ98A/WCfPNhTOQ3Xcfx2eWP2stofgCcDUUQ
 nVYivipui+0TRmmK04kD2JE=
 =OMsD
 -END PGP SIGNATURE-




-- 
-mmw


Re: Wget 1.11.3 - case sensetivity and URLs

2008-06-19 Thread mm w
without touching the file system

On Thu, Jun 19, 2008 at 9:23 AM, mm w [EMAIL PROTECTED] wrote:
 a simple url-rewriting conf should fix the problem, wihout touch the file 
 system
 everything can be done server side

 Best Regards

 On Thu, Jun 19, 2008 at 6:29 AM, Coombe, Allan David (DPS)
 [EMAIL PROTECTED] wrote:
 Thanks averyone for the contributions.

 Ultimately, our purpose is to process documents from the site into our
 search database, so probably the most important thing is to limit the
 number of files being processed.  The case of  the URLs in the html
 probably wouldn't cause us much concern, but I could see that it might
 be useful to convert a site for mirroring from a non-case sensetive
 (windows) environment to a case sensetive (li|u)nix one - this would
 need to include translation of urls in content as well as filenames on
 disk.

 In the meantime - does anyone know of a proxy server that could
 translate urls from mixed case to lower case.  I thought that if we
 downloaded using wget via such a proxy server we might get the
 appropriate result.

 The other alternative we were thinking of was to post process the files
 with symlinks for all mixed case versions of files and directories (I
 think someone already suggested this - greate minds and all that...). I
 assume that wget would correctly use the symlink to determine the
 time/date stamp of the file for determining if it requires updating (or
 would it use the time/date stamp of the symlink?). I also assume that if
 wget downloaded the file it would overwrite the symlink and we would
 have to run our convert files to symlinks process again.

 Just to put it in perspective, the actual site is approximately 45gb
 (that's what the administrator said) and wget downloaded  100gb
 (463,000 files) when I did the first process.

 Cheers
 Allan

 -Original Message-
 From: Micah Cowan [mailto:[EMAIL PROTECTED]
 Sent: Saturday, 14 June 2008 7:30 AM
 To: Tony Lewis
 Cc: Coombe, Allan David (DPS); 'Wget'
 Subject: Re: Wget 1.11.3 - case sensetivity and URLs


 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Tony Lewis wrote:
 Micah Cowan wrote:

 Unfortunately, nothing really comes to mind. If you'd like, you could

 file a feature request at
 https://savannah.gnu.org/bugs/?func=additemgroup=wget, for an option

 asking Wget to treat URLs case-insensitively.

 To have the effect that Allan seeks, I think the option would have to
 convert all URIs to lower case at an appropriate point in the process.

 I think you probably want to send the original case to the server
 (just in case it really does matter to the server). If you're going to

 treat different case URIs as matching then the lower-case version will

 have to be stored in the hash. The most important part (from the
 perspective that Allan voices) is that the versions written to disk
 use lower case characters.

 Well, that really depends. If it's doing a straight recursive download,
 without preexisting local files, then all that's really necessary is to
 do lookups/stores in the blacklist in a case-normalized manner.

 If preexisting files matter, then yes, your solution would fix it.
 Another solution would be to scan directory contents for the first name
 that matches case insensitively. That's obviously much less efficient,
 but has the advantage that the file will match at least one of the
 real cases from the server.

 As Matthias points out, your lower-case normalization solution could be
 achieved in a more general manner with a hook. Which is something I was
 planning on introducing perhaps in 1.13 anyway (so you could, say, run
 sed on the filenames before Wget uses them), so that's probably the
 approach I'd take. But probably not before 1.13, even if someone
 provides a patch for it in time for 1.12 (too many other things to focus
 on, and I'd like to introduce the external command hooks as a suite,
 if possible).

 OTOH, case normalization in the blacklists would still be useful, in
 addition to that mechanism. Could make another good addition for 1.13
 (because it'll be more useful in combination with the rename hooks).

 - --
 Micah J. Cowan
 Programmer, musician, typesetting enthusiast, gamer,
 and GNU Wget Project Maintainer.
 http://micah.cowan.name/
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.6 (GNU/Linux)
 Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

 iD8DBQFIUua+7M8hyUobTrERAr0tAJ98A/WCfPNhTOQ3Xcfx2eWP2stofgCcDUUQ
 nVYivipui+0TRmmK04kD2JE=
 =OMsD
 -END PGP SIGNATURE-




 --
 -mmw




-- 
-mmw


Re: Wget 1.11.3 - case sensetivity and URLs

2008-06-19 Thread mm w
not al, but in this particular case I pretty sure they have

On Thu, Jun 19, 2008 at 10:42 AM, Tony Lewis [EMAIL PROTECTED] wrote:
 mm w wrote:

 a simple url-rewriting conf should fix the problem, wihout touch the file 
 system
 everything can be done server side

 Why do you assume the user of wget has any control over the server from which 
 content is being downloaded?





-- 
-mmw


Re: Wget 1.11.3 - case sensetivity and URLs

2008-06-16 Thread mm w
On Sat, Jun 14, 2008 at 4:30 PM, Tony Lewis [EMAIL PROTECTED] wrote:
 mm w wrote:

 Hi, after all, after all it's only my point of view :D
 anyway,

 /dir/file,
 dir/File, non-standard
 Dir/file, non-standard
 and /Dir/File non-standard

 According to RFC 2396: The path component contains data, specific to the 
 authority (or the scheme if there is no authority component), identifying the 
 resource within the scope of that scheme and authority.

 In other words, those names are well within the standard when the server 
 understands them. As far as I know, there is nothing in Internet standards 
 restricting mixed case paths.

:) read again, nobody does except some punk-head folks

 that's it, if the server manages non-standard URL, it's not my
 concern, for me it doesn't exist

 Oh. I see. You're writing to say that wget should only implement features 
 that are meaningful to you. Thanks for your narcissistic input.

no i'm not such a jerk, a simple grep/sed on the website source to
remove the malicious URL should be fine,
or an HTTP redirection when the  malicious non-standard URL is called

in other hand, if wget changes every links in lowercase, some people
should have the opposite problem
a golden rule: never distributing mixed-case URL (to your users), a
simple respect for them and everything in lower-case


 Tony





-- 
-mmw


Re: Wget 1.11.3 - case sensetivity and URLs

2008-06-13 Thread mm w
standard: the URL are case-insensitive

you can adapt your software because some people don't respect standard,
we are not anymore in 90's, let people doing crapy things deal with
their crapy world

Cheers!

On Fri, Jun 13, 2008 at 2:08 PM, Tony Lewis [EMAIL PROTECTED] wrote:
 Micah Cowan wrote:

 Unfortunately, nothing really comes to mind. If you'd like, you could
 file a feature request at
 https://savannah.gnu.org/bugs/?func=additemgroup=wget, for an option
 asking Wget to treat URLs case-insensitively.

 To have the effect that Allan seeks, I think the option would have to convert 
 all URIs to lower case at an appropriate point in the process. I think you 
 probably want to send the original case to the server (just in case it really 
 does matter to the server). If you're going to treat different case URIs as 
 matching then the lower-case version will have to be stored in the hash. The 
 most important part (from the perspective that Allan voices) is that the 
 versions written to disk use lower case characters.

 Tony





-- 
-mmw


Re: Wget 1.11.3 - case sensetivity and URLs

2008-06-13 Thread mm w
Hi, after all, after all it's only my point of view :D
anyway,

/dir/file,
dir/File, non-standard
Dir/file, non-standard
and /Dir/File non-standard

that's it, if the server manages non-standard URL, it's not my
concern, for me it doesn't exist


On Fri, Jun 13, 2008 at 3:12 PM, Tony Lewis [EMAIL PROTECTED] wrote:
 mm w wrote:

 standard: the URL are case-insensitive

 you can adapt your software because some people don't respect standard,
 we are not anymore in 90's, let people doing crapy things deal with
 their crapy world

 You obviously missed the point of the original posting: how can one 
 conveniently mirror a site whose server uses case insensitive names onto a 
 server that uses case sensitive names.

 If the original site has the URI strings /dir/file, dir/File, Dir/file, 
 and /Dir/File, the same local file will be returned. However, wget will 
 treat those as unique directories and files and you wind up with four copies.

 Allan asked if there is a way to have wget just create one copy and proposed 
 one way that might accomplish that goal.

 Tony





-- 
-mmw


Re: building on 32 extend 64 arch nix*

2008-03-18 Thread mm w
hi there, is there IRC room regarding wget dev (somewhere)? :)

Hrvoje, is it a Croatian name?


Re: building on 32 extend 64 arch nix*

2008-03-18 Thread mm w
On Tue, Mar 18, 2008 at 7:32 PM, Micah Cowan [EMAIL PROTECTED] wrote:
 -BEGIN PGP SIGNED MESSAGE-
  Hash: SHA1


  mm w wrote:
   hi there, is there IRC room regarding wget dev (somewhere)? :)

  We have #wget on freenode.net, where dev discussion is welcome; however,
  it is very low-participation atm (essentially, just myself, with a few
  lurkers); Hrvoje hasn't been seen there yet (*nudge* ;) ), and I'm not
  particularly versed on the particulars of the hashing algorithm just yet.


:D

  I try to be available on #wget when I'm awake. Of course, I'm not always
  actively monitoring it...


thank you Micah, I asked because it is (sometimes) easier, than
to send message about at line 45 of toto.c
:D

  - --
  Micah J. Cowan
  Programmer, musician, typesetting enthusiast, gamer...
  http://micah.cowan.name/
  -BEGIN PGP SIGNATURE-
  Version: GnuPG v1.4.6 (GNU/Linux)
  Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

  iD8DBQFH4HtK7M8hyUobTrERAt6nAJ4kkjmkM95uhQG2WYwB20UONTyOlACfVB/U
  jY4zxFX9wYA2Et6Q/UvJzfk=
  =pO3D
  -END PGP SIGNATURE-




-- 
-mmw


building on 32 extend 64 arch nix*

2008-03-17 Thread mm w
Hello there, I ve two gcc warning regarding data size operations

src/hash.c

unsigned long
hash_pointer (const void *ptr)
{
.
#if SIZEOF_VOID_P  4
  key += (key  44);
  key ^= (key  54);
  key += (key  36);
  key ^= (key  41);
  key += (key  42);
  key ^= (key  34);
  key += (key  39);
  key ^= (key  44);
#endif
  return (unsigned long) key;
}

this one is minor, the shift count is superior or equal to uintptr_t
size, /* quad needed */

the second one is in src/utils.c:1490
and I think is more problematic, integer overflow in expression

  else if (n  10*(W)10) DIGITS_10 (10);
  else if (n  100*(W)10)DIGITS_11 (10*(W)10);
  else if (n  1000*(W)10)   DIGITS_12 (100*(W)10);
  else if (n  1*(W)10)  DIGITS_13 (1000*(W)10);
  else if (n  10*(W)10) DIGITS_14 (1*(W)10);
  else if (n  100*(W)10)DIGITS_15 (10*(W)10);
  else if (n  1000*(W)10)   DIGITS_16 (100*(W)10);
  else if (n  1*(W)10)  DIGITS_17 (1000*(W)10);
  else if (n  10*(W)10) DIGITS_18 (1*(W)10);
  else   DIGITS_19 (10*(W)10);

I can pach it but I would like to understand exactly what you do here

Cheers!

-- 
-mmw


Re: building on 32 extend 64 arch nix*

2008-03-17 Thread mm w
On Mon, Mar 17, 2008 at 1:57 PM, Hrvoje Niksic [EMAIL PROTECTED] wrote:
 mm w [EMAIL PROTECTED] writes:

   #if SIZEOF_VOID_P  4
 key += (key  44);
 key ^= (key  54);
 key += (key  36);
 key ^= (key  41);
 key += (key  42);
 key ^= (key  34);
 key += (key  39);
 key ^= (key  44);
   #endif
  

  this one is minor, the shift count is superior or equal to uintptr_t
   size, /* quad needed */

  What is the size of uintptr_t on your platform?  If it is 4, the code
  should not be compiled on that platform.  If it is 8, the shift count
  should be correct.  If it is anything else, you have some work ahead
  of you.  :-)


ok I  isolated the both methods and I m going to test


   the second one is in src/utils.c:1490
   and I think is more problematic, integer overflow in expression

  There should be no integer overflow; I suspect SIZEOF_WGINT is
  incorrectly defined for you.


Thank you

-- 
-mmw