hello dear Perl-addicted, to admit - i am a Perl-novice and ihave not so much experience in perl. But i am willing to learn. i want to learn perl. As for now i have to solve some tasks for the college. I have to do some investigations on a board where i have no access to the db.
first of - i have to explain something; I have to grab some data out of a phpBB in order to do some field reseach. I need the data out of a forum that is runned by a user community. I need the data to analyze the discussions. To give an example - let us take this forum here. How can i grab all the data out of this forum - and get it local and then after wards put it in a local database - of a phpBB-forum - is this possible"?!"? [URL]=http://www.nukeforums.com/forums/viewforum.php?f=17[/URL] Nothing harmeful - nothing bad - nothing serious and dangerous. But the issue is. i have to get the data - so what? I need the data in a allmost full and complete formate. So i need all the data like username .- forum thread topic text of the posting and so on and so on. how to do that? [URL]=http://www.nukeforums.com/forums/viewforum.php?f=3[/URL] [URL]=http://www.nukeforums.com/forums/viewforum.php?f=17[/URL] [code] #!/usr/bin/perl use strict; use warnings; use LWP::RobotUA; use HTML::LinkExtor; use HTML::TokeParser; use URI::URL; use Data::Dumper; # for show and troubleshooting my $url = "http://www.nukeforums.com/forums/viewforum.php?f=17"; my $ua = LWP::RobotUA->new; my $lp = HTML::LinkExtor->new(\&wanted_links); my @links; get_threads($url); foreach my $page (@links) { # this loops over each link collected from the index my $r = $ua->get($page); if ($r->is_success) { my $stream = HTML::TokeParser->new(\$r->content) or die "Parse error in $page: $!"; # just printing what was collected print Dumper get_thread($stream); # would instead have database insert statement at this point } else { warn $r->status_line; } } sub get_thread { my $p = shift; my ($title, $name, @thread); while (my $tag = $p->get_tag('a','span')) { if (exists $tag->[1]{'class'}) { if ($tag->[0] eq 'span') { if ($tag->[1]{'class'} eq 'name') { $name = $p->get_trimmed_text('/span'); } elsif ($tag->[1]{'class'} eq 'postbody') { my $post = $p->get_trimmed_text('/span'); push @thread, {'name'=>$name, 'post'=>$post}; } } else { if ($tag->[1]{'class'} eq 'maintitle') { $title = $p->get_trimmed_text('/a'); } } } } return {'title'=>$title, 'thread'=>[EMAIL PROTECTED]; } sub get_threads { my $page = shift; my $r = $ua->request(HTTP::Request->new(GET => $url), sub {$lp->parse($_[0])}); # Expand URLs to absolute ones my $base = $r->base; return [map { $_ = url($_, $base)->abs; } @links]; } sub wanted_links { my($tag, %attr) = @_; return unless exists $attr{'href'}; return if $attr{'href'} !~ /^viewtopic\.php\?t=/; push @links, values %attr; } [/code] If we have the necessary modules installed, and run it from the command line you'll see output such as the following: [code] $VAR1 = { 'thread' => [ { 'post' => 'Hello, I\'m pretty new to PHPNuke. I\'ve got my site up and running great! I\'m now starting to make modifications, add modules etc. I\'m using the most recent RavenPHP76. I want to display the 5 most recent forum posts at the top of the forum page. I\'m not sure if this functionality is built in, if so, how to activate. Or if there is a module or block made to do this. I looked at Raven\'s Collapsing Forum block but wasn\'t crazy about the format, and I don\'t want it to be collapsable. Thanks! mopho', 'name' => 'mopho' }, { 'post' => 'hi there', 'name' => 'sail' }, { 'post' => 'thanks for asking this; :not very sure if i got you right; Do you want to have a feed of the last forumthreads? guess the easiest way is to go to raven and ask how he did it. hth sail.', 'name' => 'sail' }, { 'post' => 'Thanks. i found what I was looking for. It wasn\'t so easy to find! It\'s called glance_mod. mopho', 'name' => 'mopho' }, { 'post' => 'hi there thx', 'name' => 'sail' }, { 'post' => 'it sound interesting - i will have also a look i google after it - and try to find out more regards sailor', 'name' => 'sail' } ], 'title' => 'Recent Forum Posts Module' }; [/code] to be honest - i think that the thing is to run the script just looped over the first index page here [URL]=http://www.nukeforums.com/forums/viewforum.php?f=17[/URL] But I need it to loop over all the more than 50 pages. Therefore I need to get a routine here this must get a subroutine .... that the code is looped [code] #!/usr/bin/perl use strict; use warnings; use LWP::RobotUA; use HTML::LinkExtor; use HTML::TokeParser; use URI::URL; use Data::Dumper; # for show and troubleshooting my $url = "http://www.nukeforums.com/forums/viewforum.php?f=17"; my $ua = LWP::RobotUA->new; my $lp = HTML::LinkExtor->new(\&wanted_links); my @links; get_threads($url); foreach my $page (@links) { # this loops over each link collected from the index my $r = $ua->get($page); if ($r->is_success) { my $stream = HTML::TokeParser->new(\$r->content) or die "Parse error in $page: $!"; # just printing what was collected print Dumper get_thread($stream); # would instead have database insert statement at this point } else { warn $r->status_line; } } [/code] This must get a subroutine - doesn t it? It has to get a subroutine in order to let the script loop over all the pages in the forum [URL]=http://www.nukeforums.com/forums/viewforum.php?f=17[/URL] in the above version it isnt set up a loop to grab each of the index pages but someone may consider that trivial. the demonstration is very imressive - and makes me thinking that Perl is very very powerful. I will try to harvest this category of the Forum (note those both categories are of my interest nothing more: [URL]=http://www.nukeforums.com/forums/viewforum.php?f=3[/URL] [URL]=http://www.nukeforums.com/forums/viewforum.php?f=17[/URL] Question - am i able to get the results of the above mentionde forum categories - and can i get the forum threads that are stored in the two above forums.... i look forward to hear from you fllobee _______________________________________________________________________ Viren-Scan für Ihren PC! Jetzt für jeden. Sofort, online und kostenlos. Gleich testen! http://www.pc-sicherheit.web.de/freescan/?mc=022222 -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>