Re: [wget-notify] add a new option

2008-09-02 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

houda hocine wrote:
  Hi,

Hi houda.

This message was sent to the wget-notify, which was not the proper
forum. Wget-notify is reserved for bug-change and (previously) commit
notifications, and is not intended for discussion (though I obviously
haven't blocked discussions; the original intent was to be able to
discuss commits, but I'm not sure I need to allow discussions any more,
so it may be disallowed soon).

The appropriate list would be wget@sunsite.dk, to which this discussion
has been redirected.

 we create a new format for archiviving (. warc), and we want to ensure
 that wget generate directly this format from the input url .
 You can help me by some ideas  to achieve this new option?
 The format is (warc -wget url)
 I am in the process of trying to understand the source code to add this
 new option.  Which .c  file fallows me to do this?

Doing this is not likely to be a trivial undertaking: the current
file-output interface isn't really abstracted enough to allow this, so
basically you'll need to modify most of the existing .c files. We are
hoping at some future point to allow for a more generic output format,
for direct output to (for instance) tarballs and .mhtml archives. At
that point, it'd probably be fairly easy to write extensions to do what
you want.

In the meantime, though, it'll be a pain in the butt. I can't really
offer much help; the best way to understand the source is to read and
explore it. However, on the general topic of adding new options to Wget,
Tony Lewis has written the excellent guide at
http://wget.addictivecode.org/OptionsHowto. Hope that helps!

Please note that I won't likely be entertaining patches to Wget to make
it output to non-mainstream archive formats, and even once generic
output mechanisms are supported, the mainstream archive formats will
most likely be supported as extension plugins or similar, and not as
built-in support within Wget.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
GNU Maintainer: wget, screen, teseq
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIvbyf7M8hyUobTrERApl8AJwNvWOdDd0Z//wbNzN/jyZFqKI5iQCfQOx4
3zlxPGaVqjsPhwa7ZwB4wrs=
=Zy+N
-END PGP SIGNATURE-


Re: Checking out Wget

2008-09-02 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

vinothkumar raman wrote:
 Hi all,
 
 I need to checkout the complete source into my local hard disk. I am using
 WinCVS when i searched for the module its saying that there is no module
 information out there. Could any one help me out i am a complete novice in
 this regard.

WinCVS won't work, because there _is_ in fact no CVS module for Wget.
Wget uses Mercurial as the source repository (and was using Subversion
prior to that). For more information about the Wget source repository
and its use, see http://wget.addictivecode.org/RepositoryAccess

That page focuses on using the hg command-line tool; you may prefer to
use TortoiseHg instead, http://tortoisehg.sourceforge.net/. The page
does offer additional information about the repository and what is
required to build from those sources.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
GNU Maintainer: wget, screen, teseq
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIvb4n7M8hyUobTrERAnquAJ9ItMQH1QYgXvyYTI6/IZDScIFGoACfVlqd
p+LMC9AK5/SwYPyuGVfd5Ns=
=RmLO
-END PGP SIGNATURE-


Re: [BUG:#20329] If-Modified-Since support

2008-09-02 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

vinothkumar raman wrote:
 We need to give out the time stamp the local file in the Request
 header for that we need to pass on the local file's time stamp from
 http_loop() to get_http() . The only way to pass on this without
 altering the signature of the function is to add a field to struct url
 in url.h
 
 Could we go for it?

That is acceptable.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
GNU Maintainer: wget, screen, teseq
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIvb5B7M8hyUobTrERAv2YAJ0ajYx+pynFLtV2YmEw7fA+vwf8ugCfSaU1
AFkIYSyyyS4egbyXjzBLXBo=
=fIT5
-END PGP SIGNATURE-


Re: [bug #20329] Make HTTP timestamping use If-Modified-Since

2008-09-02 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Yes, that's what it means.

I'm not yet committed to doing this. I'd like to see first how many
mainstream servers will respect If-Modified-Since when given as part of
an HTTP/1.0 request (in comparison to how they respond when it's part of
an HTTP/1.1 request). If common servers ignore it in HTTP/1.0, but not
in HTTP/1.1, that'd be an excellent case for holding off until we're
doing HTTP/1.1 requests.

Also, I don't think removing the previous HEAD request code is
entirely accurate: we probably would want to detect when a server is
feeding us non-new content in response to If-Modified-Since, and adjust
to use the current HEAD method instead as a fallback.

- -Micah

vinothkumar raman wrote:
 This mean we should remove the previous HEAD request code and use
 If-Modified-Since by default and have it to handle all the request and
 store pages if it is not returning a 304 response
 
 Is it so?
 
 
 On Fri, Aug 29, 2008 at 11:06 PM, Micah Cowan [EMAIL PROTECTED] wrote:
 Follow-up Comment #4, bug #20329 (project wget):

 verbatim-mode's not all that readable.

 The gist is, we should go ahead and use If-Modified-Since, perhaps even now
 before there's true HTTP/1.1 support (provided it works in a reasonable
 percentage of cases); and just ensure that any Last-Modified header is sane.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIvb7t7M8hyUobTrERAsvQAJ4k7fKrsFtfC4MQtuvE3Ouwz6LseACePqt2
8JiRBKtEhmcK3schVVO347A=
=yCJV
-END PGP SIGNATURE-


Re: Support for file://

2008-09-02 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Petri Koistinen wrote:
 Hi,
 
 I would be nice if wget would also support file://.

Feel free to file an issue for this (I'll mark it Needs Discussion and
set at low priority). I'd thought there was already an issue for this,
but can't find it (either open or closed). I know this has come up
before, at least.

I think I'd need some convincing on this, as well as a clear definition
of what the scope for such a feature ought to be. Unlike curl, which
groks urls, Wget W(eb)-gets, and file:// can't really be argued to
be part of the web.

That in and of itself isn't really a reason not to support it, but my
real misgivings have to do with the existence of various excellent tools
that already do local-file transfers, and likely do it _much_ better
than Wget could hope to. Rsync springs readily to mind.

Even the system cp command is likely to handle things much better than
Wget. In particular, special OS-specific, extended file attributes,
extended permissions and the like, are among the things that existing
system tools probably handle quite well, and that Wget is unlikely to. I
don't really want Wget to be in the business of duplicating the system
cp command, but I might conceivably not mind file:// support if it
means simple _content_ transfer, and not actual file duplication.

Also in need of addressing is what recursion should mean for file://.
Between ftp:// and http://, recursion currently means different
things. In FTP, it means traverse the file hierarchy recursively,
whereas in HTTP it means traverse links recursively. I'm guessing
file:// should work like FTP (i.e., recurse when the path is a
directory, ignore HTML-ness), but anyway this is something that'd need
answering.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
GNU Maintainer: wget, screen, teseq
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIvcLq7M8hyUobTrERAl6YAJ9xeTINVkuvl8HkElYlQt7dAsUfHACfXRT3
lNR++Q0XMkcY4c6dZu0+gi4=
=mKqj
-END PGP SIGNATURE-


Re: How to debug wget ?

2008-09-02 Thread Micah Cowan
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Jinhui Li wrote:
 I am browsing the source code. And want to debug it to figure out how it
 works.
 
 So, somebody please tell me how to debug ( with GDB ) or where can I
 find information that I need.

IMO, GDB is a great tool for diagnosing a particular problem one
encounters with a program; it's not all that terribly useful for
actually understanding the code itself, though. I find it much quicker
to read through the code using a powerful viewer or editor, and making
use of tools such as cscope and ctags. The best editors, such as Vim and
Emacs, are integrated these tools, and so a simple control-click or key
combination can bring up the definition of the function being called or
the variable being referenced, or (in the case of cscope) the list of
places where a particular function is being called, etc.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer.
GNU Maintainer: wget, screen, teseq
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIvcPD7M8hyUobTrERAsCEAJ9oQDJWzD/OPAvzvgJorlByd4YqyACfdLM1
GmQUVu/xnQ7HOr493hiWG28=
=0XwB
-END PGP SIGNATURE-