Jay, SUCCESS! Thank you for your time and expertise. The way you walked me through the loop step by step was a great learning experience... I was even able to figure out an error at the end... (my fault I'm sure... but none the less, I'm learning!)
Thanks again! Here is the working loop.. I needed to print OUT to $lgdesc while (@urls) { my $url = shift(@urls); chomp $url; my $file = shift(@items); chomp $file; my $page = get($url); # insert your code to parse whatever you want here, # or write a function to call here my $parser = HTML::TokeParser::Simple->new(\$page) or die "Could not parse page"; # This will get the 10th table in the source code my ($tag, $attr); $tag = $parser->get_tag("table") foreach (1..10); # This will get the 11th instance of <tr><td $parser->get_tag("tr") foreach (1..11); $parser->get_tag("td"); my $lgdesc = $parser->get_text(); open(OUT, ">", $file) or die "can't open $file:$!"; print OUT $lgdesc; close (OUT); } > -----Original Message----- > From: Brian Volk > Sent: Tuesday, November 30, 2004 1:47 PM > To: 'daggerquill' > Subject: RE: Create new "outfile" foreach line in "inputfile" > > > Jay, > > Thank you so much for your help... I will get started right > away! If I have anymore questions, I usually do.. :~) , I > will post them to the mailing list. > > Thanks again! > > Brian > > > -----Original Message----- > > From: daggerquill [mailto:[EMAIL PROTECTED] > > Sent: Tuesday, November 30, 2004 1:14 PM > > To: Brian Volk > > Subject: Re: Create new "outfile" foreach line in "inputfile" > > > > > > On Tue, 30 Nov 2004 12:14:09 -0500, Brian Volk > > <[EMAIL PROTECTED]> wrote: > > > Great, thanks! I'm reading Chapter 5 "Hashes" in the > Lama book, I'm > > > thinking that might be what I need to do, just not real > > sure how.. :~). > > > What I want to do is, read the first line of the urls.txt > > (100 urls) ( Right > > > now I just have a single url in the script). Then get the > > text w/ $parser > > > ->get text. Then name the file w/ the first item number in the > > > item_numbers.txt file. (For now I'm just printing the text > > w/ a filehandle). > > > > > > ----- begin ----- > > > > > > # This program will get the large description from the KC > web site. > > > > > > #!/usr/bin/perl -w > > > > > > use strict; > > > use HTML::TokeParser::Simple; > > > use LWP::Simple; > > > > > > my $url = > > > > > "http://www.kcprofessional.com/us/product-details.asp?search=v > > 1&searchtext=1 > > > 804&x=0&y=0"; > > > my $page = get($url) > > > or die "Could not load URL\n"; > > > > > > # Create file to store large description > > > open LGDESC, "> largedecs.txt" > > > or die "Cannot open largedecs.txt for writing: $!"; > > > > > > my $parser = HTML::TokeParser::Simple->new(\$page) > > > or die "Could not parse page"; > > > > > > # This will get the 10th table in the source code > > > my ($tag, $attr); > > > $tag = $parser->get_tag("table") foreach (1..10); > > > > > > # This will get the 11th instance of <tr><td > > > $parser->get_tag("tr") foreach (1..11); > > > $parser->get_tag("td"); > > > my $lg_desc = $parser->get_text(); > > > > > > print LGDESC "$lg_desc, \n"; > > > > > > close LGDESC; > > > > > > ---- end -------------- > > > > > > Thank you! > > > > > > Brian Volk > > > > > > > > > > > > > -----Original Message----- > > > > From: daggerquill [mailto:[EMAIL PROTECTED] > > > > Sent: Tuesday, November 30, 2004 12:04 PM > > > > To: Brian Volk > > > > Subject: Re: Create new "outfile" foreach line in "inputfile" > > > > > > > > > > > > On Mon, 29 Nov 2004 12:38:31 -0500, Brian Volk > > > > <[EMAIL PROTECTED]> wrote: > > > > > Hi All, > > > > > > > > > > I have a urls.txt file that contains a different url on > > > > each line (100 > > > > > urls). And I have a item_numbers.txt file (100 items). I > > > > want to create a > > > > > new outfile.txt, named w/ the corresponding item_number, > > > > each time the > > > > > urls.txt file passes through the loop. Can someone please > > > > let me know > > > > > where I can read about this... Is this something I need to > > > > work into a > > > > > hash? I have a working script (screen scraping) but it is > > > > only for one url > > > > > and one outfile. > > > > > > > > > > Any direction would be greatly appreciated. > > > > > > > > > > Thanks! > > > > > > > > > > Brian Volk > > > > > HP Products > > > > > 317.298.9950 x1245 > > > > > <mailto:[EMAIL PROTECTED]> [EMAIL PROTECTED] > > > > > > > > > > > > > > > > > > Brian, > > > > > > > > Let us see the script you have, and we can help you work it > > > > into a loop. > > > > > > > > --jay savage > > > > > > > > > > > > > Brian, > > > > You're definitely on the right track. You could use hashes, but > > assuming both files are in the correct order--the second line of the > > number file goes with the second url, you can just use > arrays, as I've > > done below. This is a simple while loop that will call > > LWP::SIMPLE::get to read the page and save it. You'll need > to go back > > and add anything you want to do with HTML::TokenParser, but this > > should give you a pretty good idea of one way to loop through the > > lists and files. > > > > #!/usr/bin/perl > > use strict; > > use warnings; > > use LWP::Simple; > > > > my $urlfile = "urls.txt" ; > > my $numfile = "item_numbers.txt" ; > > > > open(URL, "<", $urlfile) or die "couldn't read urls: $!"; > > open(NUM, "<", $numfile) or die "couldn't read numbers: $!"; > > > > my @urls = <URL> ; > > my @numbers = <NUM>; > > > > close(URL); > > close(NUM); > > > > while (@urls) { > > my $url = shfit(@urls); > > chomp $url; > > my $file = shift(@numbers); > > chomp $file; > > > > my $page = get($url); > > > > # insert your code to parse whatever you want here, > > # or write a function to call here > > > > open(OUT, ">", $file) or die "can't open $file:$!"; > > print OUT $page; > > close (OUT); > > } > > __END__ > > > > As you work with it, I'm sure you'll see some places to simplify it > > (maybe "while (<URL>)"?), but the basic idea is to process the two > > lists in parallel. One thing that's important to remember in > > situations like this is to use shift and unshift rather than pop and > > push: While you've probably made sure that beginnings of > the files are > > in good order, the ends may have whitespace and blank lines > that might > > cause a mismatch. > > > > HTH, > > > > --jay > > > -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>