This kind of stuff is trivial in Perl. You've chosen a good language.
my $url =~ s|/bar$||;
...Which means: "Find any occurence of "/bar" at the very end of the URL
and replace it
with a nothing. This is called a "regex" ( short for "regular
expression" ). We usually do regexes with forward slashes, but you can
use other characters ( like "|" when the target string contains forward
slashes.
Do a web search for "Perl regex".
- Jerry Kaidor
On 11/10/2017 06:00, Martin Kaspar wrote:
hello dear perl-experts,
I'm pretty new to Programming and OO programming especially.
Nonetheless, I'm trying to get done a very simple Spider for web
crawling.
the script below - is what i got to work
it runs nicely : now i want to modify the script a bit - tailoring and
tinkering is the way to learn. I want to fetch urls with a certain
content in the URL-string
"http://www.foo.com/bar"
in other words: what is aimed, i need to fetch all the urls that
contains the term " /bar"
- then i want to extract the "bar" so that it remains the url:
http://www.foo.com
-
is this doable?
love to hear from you
Martin
#!C:\Perl\bin\perl
use strict; # You always want to include both strict and warnings
use warnings;
use LWP::Simple;
use LWP::UserAgent;
use HTTP::Request;
use HTTP::Response;
use HTML::LinkExtor;
# There was no reason for this to be in a BEGIN block (and there
# are a few good reasons for it not to be)
open my $file1,"+>>", ("links.txt");
select($file1);
#The Url I want it to start at;
# Note that I've made this an array, @urls, rather than a scalar, $URL
my @urls = ('https://the url goes in here');
my %visited; # The % sigil indicates it's a hash
my $browser = LWP::UserAgent->new();
$browser->timeout(5);
while (@urls) {
my $url = shift @urls;
# Skip this URL and go on to the next one if we've
# seen it before
next if $visited{$url};
my $request = HTTP::Request->new(GET => $url);
my $response = $browser->request($request);
# No real need to invoke printf if we're not doing
# any formatting
if ($response->is_error()) {print $response->status_line, "\n";}
my $contents = $response->content();
# Now that we've got the url's content, mark it as
# visited
$visited{$url} = 1;
my ($page_parser) = HTML::LinkExtor->new(undef, $url);
$page_parser->parse($contents)->eof;
my @links = $page_parser->links;
foreach my $link (@links) {
print "$$link[2]\n";
push @urls, $$link[2];
}
sleep 60;
}
On Wed, Oct 4, 2017 at 10:49 PM, Dan Book <gri...@gmail.com> wrote:
How can we proceed from here?
-Dan
On Mon, Sep 18, 2017 at 1:17 PM, Patrick M. Galbraith
<p...@patg.net> wrote:
Pali,
Great! Now we can start moving forward.
Sorry if my responses have been intermittent - first week at new
job.
Regards,
Patrick
On 9/16/17 4:35 AM, p...@cpan.org wrote:
I prepared branch master-new, which is based on current DBD-mysql
master
branch and revert state to pre-4.043 version, including all changes
done
after 4.043 release to master branch. I have this master-new branch
in
my fork. If you want you can use it...
https://github.com/pali/DBD-mysql/tree/master-new [1]
--
[2] [3]
Links:
------
[1] https://github.com/pali/DBD-mysql/tree/master-new
[2] http://www.facebook.com/martin.kaspar.547
[3] https://plus.google.com/u/0/104428351748591530426