File name too long

2005-03-21 Thread Martin Trautmann
Hi all,

is there a fix when file names are too long?


Example:

URL=http://search.ebay.de/ws/search/SaleSearch?copagenum=3D1sosortproperty=3D2sojs=3D1version=3D2sosortorder=3D2dfts=3D-1catref=3DC6coaction=3Dcomparesoloctog=3D9dfs=3D20050024dfte=3D-1saendtime=3D396614from=3DR9dfe=3D20050024satitle=wgetcoentrypage=3DsearchssPageName=3DADME:B:SS:DE:21;



-

bash-2.04$ wget -kxE $URL
--15:16:37--  
http://search.ebay.de/ws/search/SaleSearch?copagenum=3D1sosortproperty=3D2sojs=3D1version=3D2sosortorder=3D2dfts=3D-1catref=3DC6coaction=3Dcomparesoloctog=3D9dfs=3D20050024dfte=3D-1saendtime=3D396614from=3DR9dfe=3D20050024satitle=wgetcoentrypage=3DsearchssPageName=3DADME:B:SS:DE:21
   = 
`search.ebay.de/ws/search/SaleSearch?copagenum=3D1sosortproperty=3D2sojs=3D1version=3D2sosortorder=3D2dfts=3D-1catref=3DC6coaction=3Dcomparesoloctog=3D9dfs=3D20050024dfte=3D-1saendtime=3D396614from=3DR9dfe=3D20050024satitle=wgetcoentrypage=3DsearchssPageName=3DADME:B:SS:DE:21'

Proxy request sent, awaiting response... 301 Moved Perminantly
Location: 
/wget_W0QQcatrefZ3DC6QQcoactionZ3DcompareQQcoentrypageZ3DsearchQQcopagenumZ3D1QQdfeZ3D20050024QQdfsZ3D20050024QQdfteZ3DQ2d1QQdftsZ3DQ2d1QQfltZ3D9QQfromZ3DR9QQfsooZ3D2QQfsopZ3D2QQsaetmZ3D396614QQsojsZ3D1QQsspagenameZ3DADMEQ3aBQ3aSSQ3aDEQ3a21QQversionZ3D2
 [following]
--15:16:37--  
http://search.ebay.de/wget_W0QQcatrefZ3DC6QQcoactionZ3DcompareQQcoentrypageZ3DsearchQQcopagenumZ3D1QQdfeZ3D20050024QQdfsZ3D20050024QQdfteZ3DQ2d1QQdftsZ3DQ2d1QQfltZ3D9QQfromZ3DR9QQfsooZ3D2QQfsopZ3D2QQsaetmZ3D396614QQsojsZ3D1QQsspagenameZ3DADMEQ3aBQ3aSSQ3aDEQ3a21QQversionZ3D2
   = 
`search.ebay.de/wget_W0QQcatrefZ3DC6QQcoactionZ3DcompareQQcoentrypageZ3DsearchQQcopagenumZ3D1QQdfeZ3D20050024QQdfsZ3D20050024QQdfteZ3DQ2d1QQdftsZ3DQ2d1QQfltZ3D9QQfromZ3DR9QQfsooZ3D2QQfsopZ3D2QQsaetmZ3D396614QQsojsZ3D1QQsspagenameZ3DADMEQ3aBQ3aSSQ3aDEQ3a21QQversionZ3D2'

Length: 46,310 [text/html]
search.ebay.de/wget_W0QQcatrefZ3DC6QQcoactionZ3DcompareQQcoentrypageZ3DsearchQQcopagenumZ3D1QQdfeZ3D20050024QQdfsZ3D20050024QQdfteZ3DQ2d1QQdftsZ3DQ2d1QQfltZ3D9QQfromZ3DR9QQfsooZ3D2QQfsopZ3D2QQsaetmZ3D396614QQsojsZ3D1QQsspagenameZ3DADMEQ3aBQ3aSSQ3aDEQ3a21QQversionZ3D2.html:
 File name too long

Cannot write to 
`search.ebay.de/wget_W0QQcatrefZ3DC6QQcoactionZ3DcompareQQcoentrypageZ3DsearchQQcopagenumZ3D1QQdfeZ3D20050024QQdfsZ3D20050024QQdfteZ3DQ2d1QQdftsZ3DQ2d1QQfltZ3D9QQfromZ3DR9QQfsooZ3D2QQfsopZ3D2QQsaetmZ3D396614QQsojsZ3D1QQsspagenameZ3DADMEQ3aBQ3aSSQ3aDEQ3a21QQversionZ3D2.html'
 (File name too long).
search.ebay.de/wget_W0QQcatrefZ3DC6QQcoactionZ3DcompareQQcoentrypageZ3DsearchQQcopagenumZ3D1QQdfeZ3D20050024QQdfsZ3D20050024QQdfteZ3DQ2d1QQdftsZ3DQ2d1QQfltZ3D9QQfromZ3DR9QQfsooZ3D2QQfsopZ3D2QQsaetmZ3D396614QQsojsZ3D1QQsspagenameZ3DADMEQ3aBQ3aSSQ3aDEQ3a21QQversionZ3D2.html:
 File name too long
Converting 
search.ebay.de/wget_W0QQcatrefZ3DC6QQcoactionZ3DcompareQQcoentrypageZ3DsearchQQcopagenumZ3D1QQdfeZ3D20050024QQdfsZ3D20050024QQdfteZ3DQ2d1QQdftsZ3DQ2d1QQfltZ3D9QQfromZ3DR9QQfsooZ3D2QQfsopZ3D2QQsaetmZ3D396614QQsojsZ3D1QQsspagenameZ3DADMEQ3aBQ3aSSQ3aDEQ3a21QQversionZ3D2.html...
 nothing to do.
Converted 1 files in 0.00 seconds.



... apart from that the main thing I look for is how to obtain
the search results. I still don't manage how to get the result from
search.ebay.de and then download the links to cgi.ebay.de in one:

  wget -kxrE -l1 -D cgi.ebay.de -H $URL


Re: File name too long

2005-03-21 Thread gentoo


On Mon, 21 Mar 2005, Martin Trautmann wrote:

 is there a fix when file names are too long?
 
 bash-2.04$ wget -kxE $URL
 --15:16:37--  
 http://search.ebay.de/ws/search/SaleSearch?copagenum=3D1sosortproperty=3D2sojs=3D1version=3D2sosortorder=3D2dfts=3D-1catref=3DC6coaction=3Dcomparesoloctog=3D9dfs=3D20050024dfte=3D-1saendtime=3D396614from=3DR9dfe=3D20050024satitle=wgetcoentrypage=3DsearchssPageName=3DADME:B:SS:DE:21
= 
 `search.ebay.de/ws/search/SaleSearch?copagenum=3D1sosortproperty=3D2sojs=3D1version=3D2sosortorder=3D2dfts=3D-1catref=3DC6coaction=3Dcomparesoloctog=3D9dfs=3D20050024dfte=3D-1saendtime=3D396614from=3DR9dfe=3D20050024satitle=wgetcoentrypage=3DsearchssPageName=3DADME:B:SS:DE:21'
 
 Proxy request sent, awaiting response... 301 Moved Perminantly
 Location: 
 /wget_W0QQcatrefZ3DC6QQcoactionZ3DcompareQQcoentrypageZ3DsearchQQcopagenumZ3D1QQdfeZ3D20050024QQdfsZ3D20050024QQdfteZ3DQ2d1QQdftsZ3DQ2d1QQfltZ3D9QQfromZ3DR9QQfsooZ3D2QQfsopZ3D2QQsaetmZ3D396614QQsojsZ3D1QQsspagenameZ3DADMEQ3aBQ3aSSQ3aDEQ3a21QQversionZ3D2
  [following]
 --15:16:37--  
 http://search.ebay.de/wget_W0QQcatrefZ3DC6QQcoactionZ3DcompareQQcoentrypageZ3DsearchQQcopagenumZ3D1QQdfeZ3D20050024QQdfsZ3D20050024QQdfteZ3DQ2d1QQdftsZ3DQ2d1QQfltZ3D9QQfromZ3DR9QQfsooZ3D2QQfsopZ3D2QQsaetmZ3D396614QQsojsZ3D1QQsspagenameZ3DADMEQ3aBQ3aSSQ3aDEQ3a21QQversionZ3D2
= 
 `search.ebay.de/wget_W0QQcatrefZ3DC6QQcoactionZ3DcompareQQcoentrypageZ3DsearchQQcopagenumZ3D1QQdfeZ3D20050024QQdfsZ3D20050024QQdfteZ3DQ2d1QQdftsZ3DQ2d1QQfltZ3D9QQfromZ3DR9QQfsooZ3D2QQfsopZ3D2QQsaetmZ3D396614QQsojsZ3D1QQsspagenameZ3DADMEQ3aBQ3aSSQ3aDEQ3a21QQversionZ3D2'
 
 Length: 46,310 [text/html]

 search.ebay.de/wget_W0QQcatrefZ3DC6QQcoactionZ3DcompareQQcoentrypageZ3DsearchQQcopagenumZ3D1QQdfeZ3D20050024QQdfsZ3D20050024QQdfteZ3DQ2d1QQdftsZ3DQ2d1QQfltZ3D9QQfromZ3DR9QQfsooZ3D2QQfsopZ3D2QQsaetmZ3D396614QQsojsZ3D1QQsspagenameZ3DADMEQ3aBQ3aSSQ3aDEQ3
 a21QQversionZ3D2.html: File name too long


*** This is not problem of wget, but your filesystem. Try to do 

touch 
search.ebay.de/wget_W0QQcatrefZ3DC6QQcoactionZ3DcompareQQcoentrypageZ3DsearchQQcopagenumZ3D1QQdfeZ3D20050024QQdfsZ3D20050024QQdfteZ3DQ2d1QQdftsZ3DQ2d1QQfltZ3D9QQfromZ3DR9QQfsooZ3D2QQfsopZ3D2QQsaetmZ3D396614QQsojsZ3D1QQsspagenameZ3DADMEQ3aBQ3aSSQ3aDEQ3a21QQversionZ3D2.html

 ... apart from that the main thing I look for is how to obtain
 the search results. I still don't manage how to get the result from
 search.ebay.de and then download the links to cgi.ebay.de in one:
 
   wget -kxrE -l1 -D cgi.ebay.de -H $URL

*** maybe to create SHA1 sum of the request and store the result in this file
(but you will not know what was the original request, if you don't create some
DB of requests). Or do just simple counting

URL=.
sha1sum=$( echo -n $URL | sha1sum )
echo $sha1sum $URL  SHA1-URL.db
wget -O sha1sum.html [other options] $URL

or

URL=
i=0
echo $i $URL  URL.db
wget -O search-$i.html $URL

Could be this your solution?

Wolf.


Re: File name too long

2005-03-21 Thread Martin Trautmann
On 2005-03-21 15:32, [EMAIL PROTECTED] wrote:
 *** This is not problem of wget, but your filesystem. Try to do 
 
 touch 
 search.ebay.de/wget_W0QQcatrefZ3DC6QQcoactionZ3DcompareQQcoentrypageZ3DsearchQQcopagenumZ3D1QQdfeZ3D20050024QQdfsZ3D20050024QQdfteZ3DQ2d1QQdftsZ3DQ2d1QQfltZ3D9QQfromZ3DR9QQfsooZ3D2QQfsopZ3D2QQsaetmZ3D396614QQsojsZ3D1QQsspagenameZ3DADMEQ3aBQ3aSSQ3aDEQ3a21QQversionZ3D2.html

I'm very sure that my file system has some limits somewhere - but I
suppose a web server may create virtual URLs which will be too long or
will include illegal characters for almost any file system around.


The file name here might get repaired by some regex, e.g.
wget_?catref=C6coaction=comparecoentrypage=searchcopagenum=1dfte=Q2d1dfts=Q2d1flt=9from=R9fsoo=2fsop=2saetm=396614sojs=1sspagename=ADMEQ3aBQ3aSSQ3aDEQ3a21version=2.html

However, I'd be comfortable enough with some fixed length or char
limitation, such as a 'trim' extension:

  -tc, --trimcharacter char cut filename after character, such as _
  -tl, --trimlengthnum  cut filename after num characters
  -ts, --trimsuffixnum  digits used for incremented cut filenames
  -tt, --trimtable file log trimmed file name and original to file


For the moment I'd be happy enough with saving to a md5.html checksum as
filename instead of a filename too long for my fs.
The output log could tell me about the shrinked and the original
filename.

  search.ebay.de and then download the links to cgi.ebay.de in one:
  
wget -kxrE -l1 -D cgi.ebay.de -H $URL
 
 *** maybe to create SHA1 sum of the request and store the result in this file
 (but you will not know what was the original request, if you don't create some
 DB of requests). Or do just simple counting
 
 URL=.
 sha1sum=$( echo -n $URL | sha1sum )
 echo $sha1sum $URL  SHA1-URL.db
 wget -O sha1sum.html [other options] $URL
 
 or
 
 URL=
 i=0
 echo $i $URL  URL.db
 wget -O search-$i.html $URL
 
 Could be this your solution?

Nice idea - I'll give it a try. However, it does not answer the -D problem
itself. I'm afraid this does require some further awk/sed processing of
the result?

Thanks,
Martin


Re: File name too long

2005-03-21 Thread Hrvoje Niksic
Martin Trautmann [EMAIL PROTECTED] writes:

 is there a fix when file names are too long?

I'm afraid not.  The question here would be, how should Wget know the
maximum size of file name the file system supports?  I don't think
there's a portable way to determine that.

Maybe there should be a way for --restrict-file-names to handle this
too.


Re: File name too long

2005-03-21 Thread Hrvoje Niksic
Martin Trautmann [EMAIL PROTECTED] writes:

 On 2005-03-21 17:13, Hrvoje Niksic wrote:
 Martin Trautmann [EMAIL PROTECTED] writes:
 
  is there a fix when file names are too long?
 
 I'm afraid not.  The question here would be, how should Wget know the
 maximum size of file name the file system supports?  I don't think
 there's a portable way to determine that.

 Where did the warning come from that stated File name too long'?

I don't think it's a warning; it's an error that came from trying to
open the file.  By the time this error occurs, it's pretty much too
late to change the file name.

 If the writing failed, you'll know for sure that either writing was
 not possible or that the file name was too long.

Exactly -- there can be a number of reasons why opening a file fails,
and large file name is only one of them.

 Maybe there should be a way for --restrict-file-names to handle this
 too.

 I guess the problem is less how to identify too long filenames, but
 more how to handle them.

Identifying them is the harder problem.  Imposing an arbitrary limit
would hurt file systems with larger limits.

 It might be easier to use e.g. the suggested md5 checksum instead -

It might be useful to have an option that did that.  The problem is
that it's a very heavy-handed solution -- looking at the file name
would no longer provide a hint from which URL the file came from.