hello dear Perl-addicted, 
 
to admit - i am a Perl-novice and ihave not so much experience in perl. But i 
am willing to learn. i want to learn perl. As for now i have to solve some 
tasks for the college. I have to do some investigations on a board where i have 
no access to the db. 

first of - i have to explain something; I have to grab some data out of a phpBB 
in order to do some field reseach. I need the data out of a forum that is 
runned by a user community.  I need the data to analyze the discussions.  To 
give an example - let us take this forum here. How can i grab all the data out 
of this forum - and get it local and then after wards put it in a local  

database - of a phpBB-forum - is this possible"?!"? 
[URL]=http://www.nukeforums.com/forums/viewforum.php?f=17[/URL] 

Nothing harmeful - nothing bad - nothing serious and dangerous. But the issue 
is. i have to get the data - so what?
I need the data in a allmost full and complete formate. So i need all the data 
like

username .-
forum
thread
topic
text of the posting and so on and so on.

how to do that?


[URL]=http://www.nukeforums.com/forums/viewforum.php?f=3[/URL] 
[URL]=http://www.nukeforums.com/forums/viewforum.php?f=17[/URL] 



[code]

#!/usr/bin/perl
use strict;
use warnings;

use LWP::RobotUA;
use HTML::LinkExtor;
use HTML::TokeParser;
use URI::URL;

use Data::Dumper; # for show and troubleshooting

my $url = "http://www.nukeforums.com/forums/viewforum.php?f=17";;
my $ua = LWP::RobotUA->new;
my $lp = HTML::LinkExtor->new(\&wanted_links);

my @links;
get_threads($url);

foreach my $page (@links) { # this loops over each link collected from the index
        my $r = $ua->get($page);
        if ($r->is_success) {
                my $stream = HTML::TokeParser->new(\$r->content) or die "Parse 
error in $page: $!";
                # just printing what was collected
                print Dumper get_thread($stream);
                # would instead have database insert statement at this point
         } else {
                warn $r->status_line;
         }
}

sub get_thread {
        my $p = shift;
        my ($title, $name, @thread);
        while (my $tag = $p->get_tag('a','span')) {
                if (exists $tag->[1]{'class'}) {
                        if ($tag->[0] eq 'span') {
                                if ($tag->[1]{'class'} eq 'name') {
                                        $name = $p->get_trimmed_text('/span');
                                } elsif ($tag->[1]{'class'} eq 'postbody') {
                                        my $post = 
$p->get_trimmed_text('/span');
                                        push @thread, {'name'=>$name, 
'post'=>$post};
                                }
                        } else {
                                if ($tag->[1]{'class'} eq 'maintitle') {
                                        $title = $p->get_trimmed_text('/a');
                                }
                        }
                }
        }
        return {'title'=>$title, 'thread'=>[EMAIL PROTECTED];
}

sub get_threads {
        my $page = shift;
        my $r = $ua->request(HTTP::Request->new(GET => $url), sub 
{$lp->parse($_[0])});
        # Expand URLs to absolute ones
        my $base = $r->base;
        return [map { $_ = url($_, $base)->abs; } @links];
}

sub wanted_links {
        my($tag, %attr) = @_;
        return unless exists $attr{'href'};
        return if $attr{'href'} !~ /^viewtopic\.php\?t=/;
        push @links, values %attr;
}

[/code]



If we have the necessary modules installed, and run it from the command line 
you'll see output such as the following:



[code]

$VAR1 = {
          'thread' => [
                        {
                          'post' => 'Hello, I\'m pretty new to PHPNuke. I\'ve 
got my site up and running great! I\'m now starting to make modifications, add 
modules etc. I\'m using the most recent RavenPHP76. I want to display the 5 
most recent forum posts at the top of the forum page. I\'m not sure if this 
functionality is built in, if so, how to activate. Or if there is a module or 
block made to do this. I looked at Raven\'s Collapsing Forum block but wasn\'t 
crazy about the format, and I don\'t want it to be collapsable. Thanks! mopho',
                          'name' => 'mopho'
                        },
                        {
                          'post' => 'hi there',
                          'name' => 'sail'
                        },
                        {
                          'post' => 'thanks for asking this; :not very sure if 
i got you right; Do you want to have a feed of the last forumthreads? guess the 
easiest way is to go to raven and ask how he did it. hth sail.',
                          'name' => 'sail'
                        },
                        {
                          'post' => 'Thanks. i found what I was looking for. It 
wasn\'t so easy to find! It\'s called glance_mod. mopho',
                          'name' => 'mopho'
                        },
                        {
                          'post' => 'hi there thx',
                          'name' => 'sail'
                        },
                        {
                          'post' => 'it sound interesting - i will have also a 
look i google after it - and try to find out more regards sailor',
                          'name' => 'sail'
                        }
                      ],
          'title' => 'Recent Forum Posts Module'
        };

[/code]



to be honest - i think that the thing is to run 
the script  just looped over the first index page 
here [URL]=http://www.nukeforums.com/forums/viewforum.php?f=17[/URL] 
But I need it to loop over all the more than 50 pages. Therefore I need to get 
a routine here 


this must get a subroutine .... that the code is looped 

[code]

#!/usr/bin/perl
use strict;
use warnings;

use LWP::RobotUA;
use HTML::LinkExtor;
use HTML::TokeParser;
use URI::URL;

use Data::Dumper; # for show and troubleshooting

my $url = "http://www.nukeforums.com/forums/viewforum.php?f=17";;
my $ua = LWP::RobotUA->new;
my $lp = HTML::LinkExtor->new(\&wanted_links);

my @links;
get_threads($url);

foreach my $page (@links) { # this loops over each link collected from the index
        my $r = $ua->get($page);
        if ($r->is_success) {
                my $stream = HTML::TokeParser->new(\$r->content) or die "Parse 
error in $page: $!";
                # just printing what was collected
                print Dumper get_thread($stream);
                # would instead have database insert statement at this point
         } else {
                warn $r->status_line;
         }
}


[/code]


This must get a subroutine - doesn t it?  


It has to  get a subroutine in order to let the script loop  over all the pages 
in the forum  [URL]=http://www.nukeforums.com/forums/viewforum.php?f=17[/URL] 
in the above version it  isnt set up a loop to grab each of the index pages but 
someone may consider that trivial. the demonstration is very imressive - and 
makes me thinking that Perl is very very powerful.  I will try to harvest this 
category of the Forum (note  those both categories are of my interest nothing 
more:   [URL]=http://www.nukeforums.com/forums/viewforum.php?f=3[/URL] 
[URL]=http://www.nukeforums.com/forums/viewforum.php?f=17[/URL] 

Question - am i able to get the results of the above mentionde forum categories 
- and can i get the forum threads that are stored in the two above forums.... 


i look forward to hear from you

fllobee




_______________________________________________________________________
Viren-Scan für Ihren PC! Jetzt für jeden. Sofort, online und kostenlos.
Gleich testen! http://www.pc-sicherheit.web.de/freescan/?mc=022222


--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>


Reply via email to