Re: [gentoo-user] creating local copies of web pages

2005-12-05 Thread Billy Holmes

Robert Persson wrote:
The trouble is that I have a bookmark file with several hundred entries. wget 
is supposed to be fairly good at extracting urls from text files, but it 
couldn't handle this particular file.


Try this:

emerge HTML-Tree

then as a normal user, run this script like so (where $file is your 
bookmark file)


$ perl listhref.pl $file  list.txt

[snip]
#!/usr/bin/perl
use HTML::Tree;
print join(\n,(map { $_-attr('href') } 
HTML::TreeBuilder-new()-parse_file(shift)-look_down(_tag,A,sub { 
$_[0]-attr('href') ne  }) )).\n;

exit;
[snip]

Then you can process your urls like so:

xargs wget -m  list.txt
--
gentoo-user@gentoo.org mailing list



Re: [gentoo-user] creating local copies of web pages

2005-12-05 Thread Billy Holmes

Robert Persson wrote:
The trouble is that I have a bookmark file with several hundred entries. wget 
is supposed to be fairly good at extracting urls from text files, but it 
couldn't handle this particular file.


my previous message assumes that your bookmark file is in reality a HTML 
file.

--
gentoo-user@gentoo.org mailing list



Re: [gentoo-user] creating local copies of web pages

2005-12-03 Thread Martins Steinbergs
On Saturday 03 December 2005 09:04, Robert Persson wrote:
 I wasn't running it as root. The strange thing is that httrack did start
 creating a directory structure in ~/websites consisting of a couple of
 dozen directories or so (e.g.
 ~/websites/politics/www.fromthewilderness.com/free/ww3/), but it didn't
 actually store any html or other site content, despite the fact that it was
 taking a very long time to do this and was claiming to have downloaded
 hundreds of files.
 --
 Robert Persson

 Don't use nuclear weapons to troubleshoot faults.
 (US Air Force Instruction 91-111, 1 Oct 1997)

if there isn't any files or folders under /websites then it isn't problem with 
httrack. if mirroring goes wrong, then there at least should be project 
folder containing hts-cash folder and hts-log.txt; index.html files. sorry, 
not much help from here.
martins

-- 
Linux 2.6.15-rc2 AMD Athlon(tm) 64 Processor 3200+
 15:20:24 up  1:03,  3 users,  load average: 0.27, 0.13, 0.08


pgpIoeK6Esnjg.pgp
Description: PGP signature


Re: [gentoo-user] creating local copies of web pages

2005-12-03 Thread Matthew Cline
On 12/3/05, Robert Persson [EMAIL PROTECTED] wrote:

 The trouble is that I have a bookmark file with several hundred entries. wget
 is supposed to be fairly good at extracting urls from text files, but it
 couldn't handle this particular file.


I don't know what the exact format of your particular text file is,
but why don't you just sed and friends to convert the text file into a
format that wget can use?


HTH,

Matt

-- 
gentoo-user@gentoo.org mailing list



Re: [gentoo-user] creating local copies of web pages

2005-12-03 Thread Robert Persson
On December 3, 2005 05:40 am Martins Steinbergs was like:
 if there isn't any files or folders under /websites then it isn't problem
 with httrack. if mirroring goes wrong, then there at least should be
 project folder containing hts-cash folder and hts-log.txt; index.html
 files. sorry, not much help from here.
 martins

But that's not what I've been saying, Martins. httrack +does+ create 
directories in ~/websites, including hts-cache. It also creates hts-log.txt, 
index.html, a lock file and a couple of gifs. However hts-cache is the only 
one of those directories with anything in it (aside from subdirectories and 
sub-subdirectories), and index.html is an empty file. What there is in 
hts-cache is a file called new.dat which contains a lot of the html that 
ought to have been put into the folders, all rolled into one huge file.

-- 
Robert Persson

Don't use nuclear weapons to troubleshoot faults.
(US Air Force Instruction 91-111, 1 Oct 1997)

-- 
gentoo-user@gentoo.org mailing list



Re: [gentoo-user] creating local copies of web pages

2005-12-02 Thread Neil Bothwick
On Thu, 1 Dec 2005 17:41:36 -0800, Robert Persson wrote:

 One option would be to feed wget a list of urls. The trouble is I don't
 know how to turn an html bookmark file into a simple list of urls. I
 imagine I could do it in sed if I spent enough time to learn sed, but
 my afternoon has gone now and I don't have the time.

wget will accept most files containing URLs, it doesn't have to be a
straight list. Try feeding it your bookmark file as is.


-- 
Neil Bothwick

Excuse for the day: daemons did it


signature.asc
Description: PGP signature


Re: [gentoo-user] creating local copies of web pages

2005-12-02 Thread Martins Steinbergs
On Friday 02 December 2005 07:25, Shawn Singh wrote:
 I guess I'm not exactly sure what you're trying to do, but when I want to
 get a local copy of a website I do this:

 nohup wget -m http://www.someUrL.org 

 Shawn

 On 12/2/05, Robert Persson [EMAIL PROTECTED] wrote:
  I have been trying all afternoon to make local copies of web pages from a
  netscape bookmark file. I have been wrestling with httrack (through
  khttrack), pavuk and wget, but none of them work. httrack and pavuk seem
  to
  claim they can do the job, but they can't, or at least not in any way an
  ordinary mortal could be expected to work out. They do things like
  pretending
  to download hundreds of files without actually saving them to disk,
  crashing
  suddenly and frequently, and popping up messages saying that I haven't
  contributed enough code to their project to expect the thing to work
  properly. I don't want to do anything hideously complicated. I just want
  to
  make local copies of some bookmarked pages. What tools should I be using?
 
  I would be happy to use a windows tool in wine if it worked. I would be
  happy
  to reboot into Windows if I could get this job done.
 
  One option would be to feed wget a list of urls. The trouble is I don't
  know
  how to turn an html bookmark file into a simple list of urls. I imagine I
  could do it in sed if I spent enough time to learn sed, but my afternoon
  has
  gone now and I don't have the time.
 
  Many thanks
  Robert
  --
  Robert Persson
 
  Don't use nuclear weapons to troubleshoot faults.
  (US Air Force Instruction 91-111, 1 Oct 1997)
 
  --
  gentoo-user@gentoo.org mailing list

 --
 Shawn Singh

i use httrack linux and windows versions, generally without problems, 
sometimes fails parse dinamic content websites but man httrack has plenty 
options described. in previous work (windows only) i run daily task with 
httrack to get fresh rar files with database updates.
if there realy no files and dirs created in ~/websites folder, try to check 
write permissions or is there any space left.


-- 
Linux 2.6.15-rc2 AMD Athlon(tm) 64 Processor 3200+
 11:18:28 up 45 min,  7 users,  load average: 0.00, 0.00, 0.00


pgpNFIl9VI4Mi.pgp
Description: PGP signature


Re: [gentoo-user] creating local copies of web pages

2005-12-02 Thread Robert Persson
On December 2, 2005 01:05 am Neil Bothwick was like:
 wget will accept most files containing URLs, it doesn't have to be a
 straight list. Try feeding it your bookmark file as is.

Tried that. It borked.  :-(
-- 
Robert Persson

Don't use nuclear weapons to troubleshoot faults.
(US Air Force Instruction 91-111, 1 Oct 1997)

-- 
gentoo-user@gentoo.org mailing list



Re: [gentoo-user] creating local copies of web pages

2005-12-02 Thread Robert Persson
On December 2, 2005 01:37 am Martins Steinbergs was like:
 if there realy no files and dirs created in ~/websites folder, try to check
 write permissions or is there any space left.

Permissions are fine and there is quite a bit of space on the disk. httrack 
creates directories  in ~/websites, but no other files, despite the fact that 
it claims to be downloading bucketloads of them.
-- 
Robert Persson

Don't use nuclear weapons to troubleshoot faults.
(US Air Force Instruction 91-111, 1 Oct 1997)

-- 
gentoo-user@gentoo.org mailing list



Re: [gentoo-user] creating local copies of web pages

2005-12-02 Thread Billy Holmes

Robert Persson wrote:
I have been trying all afternoon to make local copies of web pages from a 
netscape bookmark file. I have been wrestling with httrack (through 


wget -r http://$site/

have you tried that, yet?
--
gentoo-user@gentoo.org mailing list



Re: [gentoo-user] creating local copies of web pages

2005-12-02 Thread Robert Persson
On December 2, 2005 07:42 am Billy Holmes was like:
 Robert Persson wrote:
  I have been trying all afternoon to make local copies of web pages from a
  netscape bookmark file. I have been wrestling with httrack (through

 wget -r http://$site/

 have you tried that, yet?

The trouble is that I have a bookmark file with several hundred entries. wget 
is supposed to be fairly good at extracting urls from text files, but it 
couldn't handle this particular file.

Robert

-- 
Robert Persson

Don't use nuclear weapons to troubleshoot faults.
(US Air Force Instruction 91-111, 1 Oct 1997)

-- 
gentoo-user@gentoo.org mailing list



Re: [gentoo-user] creating local copies of web pages

2005-12-02 Thread Robert Persson
On December 2, 2005 06:40 am Martins Steinbergs was like:
 if httrack is runing as root all stuff goes to /root/websites/ , explored
 there?

I wasn't running it as root. The strange thing is that httrack did start 
creating a directory structure in ~/websites consisting of a couple of dozen 
directories or so (e.g. 
~/websites/politics/www.fromthewilderness.com/free/ww3/), but it didn't 
actually store any html or other site content, despite the fact that it was 
taking a very long time to do this and was claiming to have downloaded 
hundreds of files.
-- 
Robert Persson

Don't use nuclear weapons to troubleshoot faults.
(US Air Force Instruction 91-111, 1 Oct 1997)

-- 
gentoo-user@gentoo.org mailing list



Re: [gentoo-user] creating local copies of web pages

2005-12-01 Thread Shawn Singh
I guess I'm not exactly sure what you're trying to do, but when I want to get a local copy of a website I do this:

nohup wget -m http://www.someUrL.org 

ShawnOn 12/2/05, Robert Persson [EMAIL PROTECTED] wrote:
I have been trying all afternoon to make local copies of web pages from anetscape bookmark file. I have been wrestling with httrack (throughkhttrack), pavuk and wget, but none of them work. httrack and pavuk seem to
claim they can do the job, but they can't, or at least not in any way anordinary mortal could be expected to work out. They do things like pretendingto download hundreds of files without actually saving them to disk, crashing
suddenly and frequently, and popping up messages saying that I haven'tcontributed enough code to their project to expect the thing to workproperly. I don't want to do anything hideously complicated. I just want to
make local copies of some bookmarked pages. What tools should I be using?I would be happy to use a windows tool in wine if it worked. I would be happyto reboot into Windows if I could get this job done.
One option would be to feed wget a list of urls. The trouble is I don't knowhow to turn an html bookmark file into a simple list of urls. I imagine Icould do it in sed if I spent enough time to learn sed, but my afternoon has
gone now and I don't have the time.Many thanksRobert--Robert PerssonDon't use nuclear weapons to troubleshoot faults.(US Air Force Instruction 91-111, 1 Oct 1997)--
gentoo-user@gentoo.org mailing list-- Shawn Singh