Jay,

SUCCESS!  Thank you for your time and expertise.  The way you walked me
through the loop step by step was a great learning experience... I was even
able to figure out an error at the end... (my fault I'm sure... but none the
less, I'm learning!)

Thanks again!  Here is the working loop..  I needed to print OUT to $lgdesc



while (@urls) {
   my $url = shift(@urls);
   chomp $url;
   my $file = shift(@items);
   chomp $file;

   my $page = get($url);

   # insert your code to parse whatever you want here,
   # or write a function to call here  

   my $parser = HTML::TokeParser::Simple->new(\$page) 
        or die "Could not parse page";

   # This will get the 10th table in the source code
   my  ($tag, $attr);
   $tag = $parser->get_tag("table") foreach (1..10);

   # This will get the 11th instance of <tr><td
   $parser->get_tag("tr") foreach (1..11);
   $parser->get_tag("td");
   my $lgdesc = $parser->get_text();

   open(OUT, ">", $file) or die "can't open $file:$!";
   print OUT $lgdesc;
   close (OUT);
}



> -----Original Message-----
> From: Brian Volk 
> Sent: Tuesday, November 30, 2004 1:47 PM
> To: 'daggerquill'
> Subject: RE: Create new "outfile" foreach line in "inputfile"
> 
> 
> Jay,
> 
> Thank you so much for your help...  I will get started right 
> away!  If I have anymore questions, I usually do.. :~) , I 
> will post them to the mailing list.
> 
> Thanks again!
> 
> Brian 
> 
> > -----Original Message-----
> > From: daggerquill [mailto:[EMAIL PROTECTED]
> > Sent: Tuesday, November 30, 2004 1:14 PM
> > To: Brian Volk
> > Subject: Re: Create new "outfile" foreach line in "inputfile"
> > 
> > 
> > On Tue, 30 Nov 2004 12:14:09 -0500, Brian Volk 
> > <[EMAIL PROTECTED]> wrote:
> > > Great, thanks!  I'm reading Chapter 5 "Hashes" in the 
> Lama book, I'm
> > > thinking that might be what I need to do, just not real 
> > sure how.. :~).
> > > What I want to do is, read the first line of the urls.txt 
> > (100 urls) ( Right
> > > now I just have a single url in the script).  Then get the 
> > text w/ $parser
> > > ->get text.  Then name the file w/ the first item number in the
> > > item_numbers.txt file. (For now I'm just printing the text 
> > w/ a filehandle).
> > > 
> > > -----  begin -----
> > > 
> > > # This program will get the large description from the KC 
> web site.
> > > 
> > > #!/usr/bin/perl -w
> > > 
> > >  use strict;
> > >  use HTML::TokeParser::Simple;
> > >  use LWP::Simple;
> > > 
> > >  my $url =
> > > 
> > "http://www.kcprofessional.com/us/product-details.asp?search=v
> > 1&searchtext=1
> > > 804&x=0&y=0";
> > >  my $page = get($url)
> > >         or die "Could not load URL\n";
> > > 
> > > # Create file to store large description
> > >  open LGDESC, "> largedecs.txt"
> > >         or die "Cannot open largedecs.txt for writing: $!";
> > > 
> > >  my $parser = HTML::TokeParser::Simple->new(\$page)
> > >         or die "Could not parse page";
> > > 
> > > # This will get the 10th table in the source code
> > >  my  ($tag, $attr);
> > >  $tag = $parser->get_tag("table") foreach (1..10);
> > > 
> > > # This will get the 11th instance of <tr><td
> > >  $parser->get_tag("tr") foreach (1..11);
> > >  $parser->get_tag("td");
> > >  my $lg_desc = $parser->get_text();
> > > 
> > >  print  LGDESC "$lg_desc, \n";
> > > 
> > >  close LGDESC;
> > > 
> > > ---- end --------------
> > > 
> > > Thank you!
> > > 
> > > Brian Volk
> > > 
> > > 
> > > 
> > > > -----Original Message-----
> > > > From: daggerquill [mailto:[EMAIL PROTECTED]
> > > > Sent: Tuesday, November 30, 2004 12:04 PM
> > > > To: Brian Volk
> > > > Subject: Re: Create new "outfile" foreach line in "inputfile"
> > > >
> > > >
> > > > On Mon, 29 Nov 2004 12:38:31 -0500, Brian Volk
> > > > <[EMAIL PROTECTED]> wrote:
> > > > > Hi All,
> > > > >
> > > > > I have a urls.txt file that contains a different url on
> > > > each line (100
> > > > > urls).  And I have a item_numbers.txt file (100 items).  I
> > > > want to create a
> > > > > new outfile.txt, named w/ the corresponding item_number,
> > > > each time the
> > > > > urls.txt file passes through the loop.   Can someone please
> > > > let me know
> > > > > where I can read about this...  Is this something I need to
> > > > work into a
> > > > > hash?  I have a working script (screen scraping) but it is
> > > > only for one url
> > > > > and one outfile.
> > > > >
> > > > > Any direction would be greatly appreciated.
> > > > >
> > > > > Thanks!
> > > > >
> > > > > Brian Volk
> > > > > HP Products
> > > > > 317.298.9950 x1245
> > > > >  <mailto:[EMAIL PROTECTED]> [EMAIL PROTECTED]
> > > > >
> > > > >
> > > >
> > > > Brian,
> > > >
> > > > Let us see the script you have, and we can help you work it
> > > > into a loop.
> > > >
> > > > --jay savage
> > > >
> > > 
> > 
> > 
> > Brian,
> > 
> > You're definitely on the right track.  You could use hashes, but
> > assuming both files are in the correct order--the second line of the
> > number file goes with the second url, you can just use 
> arrays, as I've
> > done below.  This is a simple while loop that will call
> > LWP::SIMPLE::get to read the page and save it.  You'll need 
> to go back
> > and add anything you want to do with HTML::TokenParser, but this
> > should give you a pretty good idea of one way to loop through the
> > lists and files.
> > 
> > #!/usr/bin/perl
> > use strict;
> > use warnings;
> > use LWP::Simple;
> > 
> > my $urlfile = "urls.txt" ;
> > my $numfile = "item_numbers.txt" ;
> > 
> > open(URL, "<", $urlfile) or die "couldn't read urls: $!";
> > open(NUM, "<", $numfile) or die "couldn't read numbers: $!";
> > 
> > my @urls = <URL> ;
> > my @numbers = <NUM>;
> > 
> > close(URL);
> > close(NUM);
> > 
> > while (@urls) {
> >    my $url = shfit(@urls);
> >    chomp $url;
> >    my $file = shift(@numbers);
> >    chomp $file;
> > 
> >    my $page = get($url);
> > 
> >    # insert your code to parse whatever you want here,
> >    # or write a function to call here  
> > 
> >    open(OUT, ">", $file) or die "can't open $file:$!";
> >    print OUT $page;
> >    close (OUT);
> > }
> > __END__
> > 
> > As you work with it, I'm sure you'll see some places to simplify it
> > (maybe "while (<URL>)"?), but the basic idea is to process the two
> > lists in parallel.  One thing that's important to remember in
> > situations like this is to use shift and unshift rather than pop and
> > push: While you've probably made sure that beginnings of 
> the files are
> > in good order, the ends may have whitespace and blank lines 
> that might
> > cause a mismatch.
> > 
> > HTH,
> > 
> > --jay
> > 
> 

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>


Reply via email to