hello dear perl-experts,
I'm pretty new to Programming and OO programming especially. Nonetheless, I'm trying to get done a very simple Spider for web crawling. the script below - is what i got to work it runs nicely : now i want to modify the script a bit - tailoring and tinkering is the way to learn. I want to fetch urls with a certain content in the URL-string "http://www.foo.com/bar" in other words: what is aimed, i need to fetch all the urls that contains the term " /bar" - then i want to extract the "bar" so that it remains the url: http://www.foo.com - is this doable? love to hear from you Martin #!C:\Perl\bin\perl use strict; # You always want to include both strict and warnings use warnings; use LWP::Simple; use LWP::UserAgent; use HTTP::Request; use HTTP::Response; use HTML::LinkExtor; # There was no reason for this to be in a BEGIN block (and there # are a few good reasons for it not to be) open my $file1,"+>>", ("links.txt"); select($file1); #The Url I want it to start at; # Note that I've made this an array, @urls, rather than a scalar, $URL my @urls = ('https://the url goes in here'); my %visited; # The % sigil indicates it's a hash my $browser = LWP::UserAgent->new(); $browser->timeout(5); while (@urls) { my $url = shift @urls; # Skip this URL and go on to the next one if we've # seen it before next if $visited{$url}; my $request = HTTP::Request->new(GET => $url); my $response = $browser->request($request); # No real need to invoke printf if we're not doing # any formatting if ($response->is_error()) {print $response->status_line, "\n";} my $contents = $response->content(); # Now that we've got the url's content, mark it as # visited $visited{$url} = 1; my ($page_parser) = HTML::LinkExtor->new(undef, $url); $page_parser->parse($contents)->eof; my @links = $page_parser->links; foreach my $link (@links) { print "$$link[2]\n"; push @urls, $$link[2]; } sleep 60; } On Wed, Oct 4, 2017 at 10:49 PM, Dan Book <gri...@gmail.com> wrote: > How can we proceed from here? > -Dan > > On Mon, Sep 18, 2017 at 1:17 PM, Patrick M. Galbraith <p...@patg.net> > wrote: > >> Pali, >> >> Great! Now we can start moving forward. >> >> Sorry if my responses have been intermittent - first week at new job. >> >> Regards, >> >> Patrick >> On 9/16/17 4:35 AM, p...@cpan.org wrote: >> >> I prepared branch master-new, which is based on current DBD-mysql master >> branch and revert state to pre-4.043 version, including all changes done >> after 4.043 release to master branch. I have this master-new branch in >> my fork. If you want you can use it... >> https://github.com/pali/DBD-mysql/tree/master-new >> >> -- <http://www.facebook.com/martin.kaspar.547> [image: gplus_Seiten_Signatur] <https://plus.google.com/u/0/104428351748591530426>