Re: [Bulk] Updates on Wget Future Directions

2008-04-01 Thread Todd Pattist

Micah Cowan wrote:

Unccl svefg bs Ncevy, sbyxf.

For greater clarity try:
http://www.rot13.com/index.php


Re: Release: GNU Wget 1.11.1

2008-03-25 Thread Todd Pattist

Micah Cowan wrote:

Announcing the release of version 1.11.1 of GNU Wget.
** Documentation of accept/reject lists in the manual's Types of
Files section now explains various aspects of their behavior that may
be surprising, and notes that they may change in the future.
I'm glad to see that this made it into the docs - even if this behavior 
is drastically altered in the next rev. 

I'm interested in your thoughts on the future of the accept/reject 
filter options.  Currently, accept/reject provides mixed control over 
file retention and link traversal.  Those filters do not apply to html 
files during the first pass through those filters (for traversal), but 
do apply during during the second pass for file retention. 

I can see splitting the accept/reject filters into two independent 
filter sets. One set would follow/no-follow  links and the other set 
would keep/delete files after retrieval.  Obviously query string 
matching would be nice in the first set.  OTOH, I can imagine keeping 
accept/reject solely to control file retention and using more advanced 
logic than simple htm/html extension matching to get deeper traversal of 
script sites when permitted by the recursion depth or other controls.  
What do you see as the best approach?


As long as I'm posting, I'll give some very minor feedback on the docs.  
It would be nice to have a cross reference of the three formats - short 
option, long option and control file or just list all 3 in the first 
discussion of the option.  Section 4 uses that method, but Section 2 
does not.  I often found myself searching for the correct wgetrc startup 
file format after reading up on an the option.  As an example, Section 2 
tells you that `-l depth' or `--level=depth' can be used as recursion 
depth options, but you have to do a bit of searching to find out that 
reclevel=depth and not level=depth is the matching wgetrc command.


Related to the same issue and for other Windows users who may search the 
archive: as a new user, it's nice to use the long form option, since it 
makes it easier to remember what you're trying to do.  However, a 
command line of 200 chars is hard to read.  I found myself organizing 
all my options into a customized wgetrc file for each site.  In Windows, 
each instance of wget started via a batch file would spawn it's own 
local environment, so I could run multiple copies of wget 
simultaneously, each initiated from a separate batch file and each with 
its own customized set WGETRC=Site1-wgetrc.txt followed by the basic 
wget Site1.com command.







** Documentation of --no-parents now explains how a trailing slash, or
lack thereof, in the specified URL, will affect behavior.

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer,
and GNU Wget Project Maintainer.
http://micah.cowan.name/
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFH6DIn7M8hyUobTrERAvYMAJ9Ue10o87jff1xuZo5hHFzUwkI3oQCfWVTt
HikOEmEAIxjtzV1Pliji5g8=
=jO0N
-END PGP SIGNATURE-

  


Re: Accept and Reject - particularly for PHP and CGI sites

2008-03-20 Thread Todd Pattist





   When deciding whether it should delete a file afterwards, however, it
 uses the _local_ filename (relevant code also in recur.c, near "Either
 --delete-after was specified,"). I'm not positive, but this probably
 means query strings _do_ matter in that case. :p
 
 Confused? Coz I sure am!

I had thought there was already an issue filed against this, but upon
searching discovered I was thinking of a couple related bug that had
been closed. I've filed a new issue for this:

https://savannah.gnu.org/bugs/?22670


I'm not sure whether this post should go into the buglist discussion or
here, but I'll put it here.

I have to say, I'm not sure this is properly classed as a bug. If
accept/reject applies to the original URL filename, why should the code
bother to apply it again to the local name? If filters don't pass the
URL filename and wget doesn't retrieve the file, it can't save it. I
assume the answer was to handle script and content_disposition cases
where you don't know what you're going to get back. If you match only
on URL, you'd have no way to control traversing separate from file
retention, and that's something you definitely want. (It's the default
for conventional html based sites.) To put it another way, I usually
want to download all the php files, and traverse all that turn out to
be html, but I may only want to keep the zips or jpgs. With two
checks, one before download on the URL filename and another after
download on the local filename, I've got some control in cgi, php
script based sites that is similar to the control in a conventional
html page site. 

If this behavior is changed, then you'd probably need to have two sets
of accept/reject filters that could be defined separately, one set to
control traversing, and one to control file retention. I'd actually
prefer that, particularly with matching extended to the query string
portion of the URL. Right now, it may be impossible to prevent
traversing some links. If you don't want to traverse
"index.php?mode=logout", but do want to get "index.php?mode=getfile"
there's no way to do it since the URL filename is the same.

In the short term, it would help to add something to the documentation
in the accept/reject area, such as the following:

The accept/reject filters are applied to the
filename twice - once to the filename in the URL before downloading to
determine if the file should be retrieved (and parsed for more links if
it is determined after download to be an html file) and again to the
local filename after it is retrieved to determine if it should be
kept. The local filename after retrieval may be significantly
different from the URL filename before retrieval for many reasons.
These include:
1) The URL filename does not include any query string portion of the
URL, such as the string "?topic=16" in the URL
"http://site.com/index.php?topic=16". After download the file may be
stored as the local filename "[EMAIL PROTECTED]". Accept/reject
matching does not apply to the URL query string portion before
download, but will apply after download when the query string is
incorporated into the local filename.
2) When content disposition is on, the local filename may be completely
different from the URL filename. The URL "index.php?getfile=21" may
return a content disposition header producing a local file of
"some_interesting_file.zip".
3) The -E (html extension) and sometimes the -nd (no directories)
switches will alter the filename suffix by adding .html or .1 for
duplicate files.

If the URL filename in links found when the starting page is parsed do
not pass the accept/reject filters, the links will not be followed and
will not be parsed for more links unless the filename ends html or
htm. If accept/reject filters are used on cgi, php, asp and similar
script based sites the URL filename must pass the filters (without
considering any query string portion) if the links are to be
traversed/parsed, and the local filename must pass the filters if the
retrieved files are to be retained.




Re: Accept and Reject - particularly for PHP and CGI sites

2008-03-20 Thread Todd Pattist






  If we were going to leave this behavior in for some time, then I think
it'd be appropriate to at least mention it (maybe I'll just mention it
anyway, without a comprehensive explanation

It would probably be sufficient to just add a very brief mention
to the docs of 1.11, the two things that confused the heck out of me - 
1) The accept/reject filters are applied twice, once to the URL
filename before retrieval and once to the local filename after
retrieval, and
2) A query string is not considered to be part of the URL filename.

You can probably imagine my confusion when I saw [EMAIL PROTECTED]
being saved. I then tried to prevent that link from being traversed
with a match on part of the query string, and I'd see that file
disappear, only to later realize it was traversed. I had no idea that
the query string was not being considered during the acc/rej match, nor
that the process was performed twice.

I look forward to 1.12.





Re: Accept and Reject - particularly for PHP and CGI sites

2008-03-19 Thread Todd Pattist

Micah Cowan wrote:

Well, -E is special, true. But in general the second quote is (by
definition) correct.

- -E, obviously, _shouldn't_ be special...


I hope it's clear I'm not complaining.  Wget is great and your efforts 
are very much appreciated.  I just wanted to document the behavior I was 
seeing in a way that would help others.  I actually like the current 
behavior - now that I (more or less)understand it.  I can add php to the 
accept list, which controls traversing, and also optionally add html if 
I want to keep the html files.  If file retention was determined based 
solely on the URL, then traversal and local file retention would be 
inextricably linked.



I haven't yet quite figured out file extension matching versus string
matching in filenames, but extensions seem to match regardless of
leading characters or following ?id=1 parameters.


That's right; the query portion of the URL is not used to determine
matching. There are, of course, times when you specifically wish to tell
wget not to follow certain specific query strings (such as edit or print
or... in wikis); wget doesn't currently support this (I plan to fix this).


Now I'm confused again.  I suppose I can go through more trial and error 
 or dig through the source to figure out what it's really doing, but in 
hopes you can throw more light on this, I'll explicate what is confusing 
me. (comments relate to wget 1.11 running on Windows XP)


Confusion 1:  Right now, I'm only using file extensions in the accept= 
parameters, such as  accept=zip,jpg,gif,php  etc.  Even if the query 
portion (the ?id=1 part of site.com/index.php?id=1) is not considered 
during matching, it's not clear to me why accept=php matches 
site.com/index.php.  Why don't I need *.php (Windows) or *php 
(assuming the *glob matches the period).  Would accept=index match 
index.php?id=1? How about accept=*index*  I assumed I could do an 
accept match on the query portion, the filename portion, or even the 
domain, but I suspect now that's wrong.  The domain gets stripped off 
when the local name is constructed, so I realize now I can't match on 
that (local filename used for matching), but the query portion is 
usually left as part of the filename, with an atsign replacing the 
question mark.  Is filename matching allowed or only extension matching?


Confusion 2: I'm rejecting based on the query string, usually after an 
accept string allowing defined extensions.  I think I understand this, 
and I think it's working fine.  I'm usually doing something like 
reject=*logout*,*subscribe=*,*watch=* to prevent traversal of logout 
links or thread subscription links in a phpbb setting.  This works.  I 
think it's doing exactly what you say it's not yet capable of doing, but 
maybe I'm missing something.  Does the accept matching work differently 
from the reject matching?  Does reject work on the URL before retrieval, 
but accept work on the local filename after retrieval?  If the 
site.com/index.php?mode=logout link was being traversed with

accept=php and reject=*logout*, I would be getting logged out, but I'm not.

Hm. light perhaps begins to dawn.  It looks like both accept and 
reject are applied twice - once before retrieval and once after. To be 
retrieved/traversed it has to pass both filters and then after local 
renaming, it has to pass both again.  That would fit what I'm seeing. 
My reject filter prevents traversing logout links during the first pass 
and my accept filter deletes php files during the second check after 
html renaming.


Thanks for any comments or clarifications.



Accept and Reject - particularly for PHP and CGI sites

2008-03-10 Thread Todd Pattist

I'm having trouble understanding how accept and reject work,
particularly in the context of sites that rely on CGI and PHP to
dynamically generate html pages.  My questions relate to the following:

1) I don't fully understand the -A and -R effects and the difference, if
any, between what links are traversed and parsed for deeper links,
versus what files are kept and stored locally.  The docs seem to say
that -A and -R have no effect on the link traverse for html files, but
this doesn't seem true for dynamically generated CGI, PHP files.  Does
html_extension=on affect link traversal?  I'd like to be able to
independently control link traversal vs. file retrieval with local file
storage.  Do the directory include/exclude commands allow this - do they
work differently from -A -R?

2) The logs seem to show PHP files being retrieved and then not saved.
When mirroring a forum, you often want to exclude links that do a
logout, or subscribe you to a topic.  Does -R prevent a dynamically
generated html page from a PHP link from being traversed?

3) Which has priority if both reject and accept filters match?

4) Sometimes the OS restricts filename characters.  Do the -A and -R
filters match on the final name used to store the file, or on the name
at the server?

Thanks for any pointers or links that might help me understand this better.




Re: Accept and Reject - particularly for PHP and CGI sites

2008-03-10 Thread Todd Pattist




Thank you for the quick response. Background is I'm on Windows XP, Gnu
wget 1.11

  This "doesn't affect traversal of HTML files" functionality is currently
implemented via a heuristic based on the filename extension. That is, if
it ends in ".htm" or ".html", I believe, then it will be traversed
regardless of -A or -R settings, whereas .cgi or .php will not affect
traversal.
  

I'm not sure I understand the "cgi or .php will not affect traversal."
If I use wget to start at http://site.com/view.php?f=16 and recursively
mirror without -A or -R, it looks like it traverses deeper as though
that page and other .php links are html files. This makes sense. (I say
looks like, because it takes a long time and produces lots of files).
If I select the same page and add accept=site.com/view.php?id=16 to
wgetrc, no pages are saved and it does not traverse any deeper and it
takes only a second or two. I see this in the log:

Saving to: `site.com/[EMAIL PROTECTED]'
Removing site.com/[EMAIL PROTECTED] since it should be rejected.

I recognize that the question mark was substituted for my OS, but that
does not matter on the accept filter. What does matter is whether I
have the .html or not in the accept filter. That surprises me. Both
accept=site.com/view.php?id=16.html and accept=site.com/view.php?id=16*
will match and keep the 
site.com/[EMAIL PROTECTED] file, while both
accept=site.com/view.php?id=16 and accept=site.com/[EMAIL PROTECTED] cause
it not to match and generate the "Removing ... since it should be
rejected" line. Regardless of the matching/saving this seems to
control traversal, as I get far deeper traversal with no accept= at all.

I'm pretty sure I can control traversal of php links with accept and
reject, but I often want to traverse looking for certain file types,
but don't want to save all the php files traversed.


  
I'd have to look at the relevant code, but it's possible that
"directory"-looking names may also be automatically traversed in that way.
  

I don't want you to do work I can do myself. I was just hoping for a
link or some pointers that might help.


  
Does
html_extension=on affect link traversal? 

  
  
No; this only affects whether filenames are changed upon download to
explicitly include an ".html" extension (useful for local browsing).
  


It seems that the html extension is used in the filter matching of
accept/reject, and that seems to affect traversal as described above
unless I'm missing something (which is entirely possible).

  
I'd like to be able to
independently control link traversal vs. file retrieval with local file
storage.  Do the directory include/exclude commands allow this - do they
work differently from -A -R?

  
  
I'm afraid I'm unsure what you are asking here.
  

Is my question clearer from the above? I'm seeing very quick exits
(seconds) when the accept filter does not match the start page. To get
deeper traversing, I have to match, but then it saves the matched files
and the traverse takes hours, with perhaps thousands of html files
(converted from .php files), none of which I need. 


  
2) The logs seem to show PHP files being retrieved and then not saved.
When mirroring a forum, you often want to exclude links that do a
logout, or subscribe you to a topic.  Does -R prevent a dynamically
generated html page from a PHP link from being traversed?

  
  
I think I'd need to see an example log of files "being retrieved and
then not saved", to understand what you mean.
  

I put a log of this type above. By adjusting accept and reject, I can
exclude traversing a logout .php link (which I want to do), but I can't
seem to traverse links I want to traverse without also saving them
locally. It's not critical to resolve this for me, as I can always
delete what I don't want, but it is confusing. I wanted to make sure I
wasn't missing something.

  
3) Which has priority if both reject and accept filters match?

  
  Not sure; it's easy enough to test this yourself, though.
  

I have done lots of testing, so you'd think this simple one would be
obvious. The answer seems to be that reject is higher priority, since
identical accept= and reject= seem to produce no output. This matches
what the manual says. It might help to add to the manual that adding
an accept= filter causes a rejection of everything that does not match
the accept filter, even if there is no reject filter specified. The
fact that specifically accepting some files turns on a default
rejection of everything else surprised me, since the normal default is
to accept everything.

As a matter of interest, httrack uses the opposite logic. Adding a
specific accept in httrack has no effect if there is no reject. Thus,
the most common format is to reject everything followed by a list of
filetypes to accept. The wget procedure is more efficient since you
don't need the starting "reject everything," and why would you accept
if you didn't want to reject something else, but it 

Re: Accept and Reject - particularly for PHP and CGI sites

2008-03-10 Thread Todd Pattist
This cleared up a lot.  I really appreciate your reply.  I've been using 
the log and the server_response = on
parameters, but not --debug.  I'll add that now and take a look, but 
your 1..2..3.. answer below and the comment that accept/reject matching 
is on the local filename explains what I'm seeing,   From your comments, 
I'm confident I can get it to do what I want, with the only problem 
being that I'll have to delete excess files.  That's not really a 
problem for me, as long as I understand what it is doing and why.


Micah Cowan wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Todd Pattist wrote:
  

Thank you for the quick response.  Background is I'm on Windows XP, Gnu
wget 1.11


This doesn't affect traversal of HTML files functionality is currently
implemented via a heuristic based on the filename extension. That is, if
it ends in .htm or .html, I believe, then it will be traversed
regardless of -A or -R settings, whereas .cgi or .php will not affect
traversal.
  
  
I'm not sure I understand the cgi or .php will not affect traversal. 



I mean, it will not detect these as HTML files, so the accept/reject
rules will be applied to them without exception.

  

If I use wget to start at http://site.com/view.php?f=16 and recursively
mirror without -A or -R, it looks like it  traverses deeper as though
that page and other .php links are html files. This makes sense. (I say
looks like, because it takes a long time and produces lots of files). 
If I select the same page and add  accept=site.com/view.php?id=16 to

wgetrc, no pages are saved and it does not traverse any deeper and it
takes only a second or two.  I see this in the log:

Saving to: `site.com/[EMAIL PROTECTED]'
Removing site.com/[EMAIL PROTECTED] since it should be rejected.

I recognize that the question mark was substituted for my OS, but that
does not matter on the accept filter.  What does matter is whether I
have the .html or not in the accept filter.  That surprises me.  Both
accept=site.com/view.php?id=16.html and accept=site.com/view.php?id=16*
will match and keep the
site.com/[EMAIL PROTECTED] file, while both
accept=site.com/view.php?id=16 and accept=site.com/[EMAIL PROTECTED] cause
it not to match and generate the Removing ... since it should be
rejected line.  Regardless of the matching/saving this seems to control
traversal, as I get far deeper traversal with no accept= at all.



After another look at the relevant portions of the source code, it looks
like accept/reject rules are _always_ applied against the local
filename, contrary to what I'd been thinking. This needs to be changed.
(But it probably won't be, any time soon.)

Note that the view.php?id=16 doesn't mean what you may perhaps think it
does: Wget detects the ? as a wildcard, and allows it to match any
character (including @). If you supplied \? instead (which matches a
literal question mark), I'm guessing it'd actually fail to match,
because it's checking against @.

My understanding is that, when you specify a URL directly at the
command-line, it will be downloaded and traversed (if it turns out to be
HTML), no matter what the accept/reject rules are (which can still cause
it to be removed afterwards). Therefore, I suspect that what Wget does
with your URL when it isn't matching the accept rules is:

  1. Downloads the named file
  2. Discovers that, regardless of the filename, it is indeed an HTML
file, so scans it for all links to be downloaded.
  3. After scanning for all the links, it doesn't find any that end in
.html, nor any that match the accept rules, so it doesn't do anything
else.

- --debug will definitely tell you whether it's bothering to scan that
first file or not, and what it decides to do with the links it finds.

  

I'm pretty sure  I can control traversal of php links with accept and
reject, but I often want to traverse looking for certain file types, but
don't want to save all the php files traversed.



We're looking for more fine-grained controls to allow this sort of
thing, but at the moment, my understanding is that there is no control
over whether Wget traverses-and-then-deletes a given file: it will
_always_ do that for files it knows or suspects are HTML (based on .htm,
.html suffixes, or if, like the above example, it will download the
filename first anyway because it's an explicit command-line argument);
it will _never_ download/traverse any other sorts of links that do not
match the accept rules.

If something _does_ match the accept rules, and turns out after download
to be an HTML file (determined by the server's headers), it will
traverse it further; but of course it won't delete them afterward
because they matched the accept list.

  

I'd have to look at the relevant code, but it's possible that
directory-looking names may also be automatically traversed in that way.
  
  

I don't want you to do work I can do myself.  I was just hoping for a
link or some pointers that might help

Re: Content-Disposition UTF-8 and filename problems

2008-03-09 Thread Todd Pattist

I'll answer my own question for the record.

It's the Content-Disposition: attachment; 
filename*=UTF-8''filename.zip  header that causes the problem.  I set 
wget to use the Privoxy proxy (wgetrc line):


http_proxy = localhost:8118/

and then set Privoxy to modify incoming server headers with a Privoxy 
filter:


SERVER-HEADER-FILTER: contentdisp Server Header filter to change 
content-disposition
s/content-disposition: attachment; 
filename\*=UTF-8''(.*)/content-disposition: attachment; filename=$1/ig


The Privoxy filter changes the server header to this form:
Content-Disposition: attachment; filename=filename.zip

which wget can read and now all is well, with the filename being saved 
under the correct name. 
BTW, when content_disposition=on the file seems to be saved in the root 
directory, not the correct directory.  With content_disposition=off, the 
wrong name is used, but it's in the right place.  I believe someone else 
has seen this problem too (from the email archives IIRC).


Thanks for a great program!



Todd Pattist wrote:
I'm having trouble with the filename after retrieving a php generated 
file download.  It is retrieved with:

http://site.com/download/file.php?id=62651
The content disposition header says:
Content-Disposition: attachment; filename*=UTF-8''filename.zip

I want it to end up as filename.zip, but it ends up as 
[EMAIL PROTECTED]  Unfortunately, I'm dealing with hundreds of files 
of varying types.


I'm using these wgetrc options:
recursive = on
content_disposition = on
verbose = on
dir_prefix = folder
server_response = on

saving the header in FireFox I see:
content-disposition: attachment; filename*=UTF-8''filename.zip
Content-Type: application/octet-stream

I'm successfully saving other files from another site with the correct 
name that have a header as follows

content-disposition: attachment; filename=flower.zip
Content-Type: zip

Is my problem due to the differences in the content-disposition: 
attachment; filename lines above, is it the UTF-8 or something else?


Any help or hints would be appreciated

Here's a logfile of the relevant request/response header exchange that 
fails:

HTTP request sent, awaiting response...
 HTTP/1.1 200 OK
 Date: Sun, 09 Mar 2008 02:31:23 GMT
 Server: Apache
 Pragma: public
 Content-Disposition: attachment; filename*=UTF-8''filename.zip
 Vary: Accept-Encoding,User-Agent
 Keep-Alive: timeout=5, max=1999
 Connection: Keep-Alive
 Content-Type: application/octet-stream
Length: unspecified [application/octet-stream]
--2008-03-08 21:31:25--  http://site.com/download/file.php?id=62651
Connecting to site.com|70.87.3.196|:80... connected.
HTTP request sent, awaiting response...
 HTTP/1.1 200 OK
 Date: Sun, 09 Mar 2008 02:31:24 GMT
 Server: Apache
 Pragma: public
 Content-Disposition: attachment; filename*=UTF-8''filename.zip
 Content-Length: 125127
 Vary: Accept-Encoding,User-Agent
 Keep-Alive: timeout=5, max=2000
 Connection: Keep-Alive
 Content-Type: application/octet-stream
Length: 125127 (122K) [application/octet-stream]
Saving to: `foldername/site.com/download/[EMAIL PROTECTED]'




Content-Disposition UTF-8 and filename problems

2008-03-08 Thread Todd Pattist
I'm having trouble with the filename after retrieving a php generated 
file download.  It is retrieved with:

http://site.com/download/file.php?id=62651
The content disposition header says:
Content-Disposition: attachment; filename*=UTF-8''filename.zip

I want it to end up as filename.zip, but it ends up as 
[EMAIL PROTECTED]  Unfortunately, I'm dealing with hundreds of files 
of varying types.


I'm using these wgetrc options:
recursive = on
content_disposition = on
verbose = on
dir_prefix = folder
server_response = on

saving the header in FireFox I see:
content-disposition: attachment; filename*=UTF-8''filename.zip
Content-Type: application/octet-stream

I'm successfully saving other files from another site with the correct 
name that have a header as follows

content-disposition: attachment; filename=flower.zip
Content-Type: zip

Is my problem due to the differences in the content-disposition: 
attachment; filename lines above, is it the UTF-8 or something else?


Any help or hints would be appreciated

Here's a logfile of the relevant request/response header exchange that 
fails:

HTTP request sent, awaiting response...
 HTTP/1.1 200 OK
 Date: Sun, 09 Mar 2008 02:31:23 GMT
 Server: Apache
 Pragma: public
 Content-Disposition: attachment; filename*=UTF-8''filename.zip
 Vary: Accept-Encoding,User-Agent
 Keep-Alive: timeout=5, max=1999
 Connection: Keep-Alive
 Content-Type: application/octet-stream
Length: unspecified [application/octet-stream]
--2008-03-08 21:31:25--  http://site.com/download/file.php?id=62651
Connecting to site.com|70.87.3.196|:80... connected.
HTTP request sent, awaiting response...
 HTTP/1.1 200 OK
 Date: Sun, 09 Mar 2008 02:31:24 GMT
 Server: Apache
 Pragma: public
 Content-Disposition: attachment; filename*=UTF-8''filename.zip
 Content-Length: 125127
 Vary: Accept-Encoding,User-Agent
 Keep-Alive: timeout=5, max=2000
 Connection: Keep-Alive
 Content-Type: application/octet-stream
Length: 125127 (122K) [application/octet-stream]
Saving to: `foldername/site.com/download/[EMAIL PROTECTED]'