hello first of all: i am new to the list.,
i work in the field-research. To begin with: well i have the data in a bunch of plain text files on the local disk. Well i need to collect some of the data out of a site - here is an example. http://www.bamaclubgp.org/forum/sitemap.php the problem - described in the threads - with a first code snippet to solve it http://forums.devshed.com/perl-programming-6/data-grabbing-and-mining-need-scripthelp-370550.html http://forums.devshed.com/perl-programming-6/minor-change-in-lwp-need-ideas-how-to-accomplish-388061.html according my view: The problem is two folded: it has two major issues or things... 1. Grabbing the data out of the site and then parsing it; finally 2. storing the data in the new - (local ) database... A guy helped me with a script that is described here http://forums.devshed.com/perl-programming-6/minor-change-in-lwp-need-ideas-how-to-accomplish-388061.html Well the question of restoring is not too hard. if i can pull almost a full thread-data-set out of the site The tables are shown here in this site: http://www.phpbbdoctor.com/doc_columns.php?id=24 Well if we are able to do the first job very good: 1. Grabbing the data out of the site and then parsing it; then The second job would be not too hard. Then i have as a result - a large file of CSV - data, donŽt i? The final question was: how can the job of restoring be done!? Then i am able to have a full set of data Well i guess that it can be done with some help of the guys from the http://www.phpBB.com -Team http://www.phpbb.com/community/viewforum.php?f=65 With a good converter or at least a part of a converter i can restore the whole cvs-dump with ease. What do you think. So if we do the first job then i think the second part can be done also. i look forward to hear from you best regards floobee here the script.... #!e:/Server/xampp/perl/bin/perl.exe -w use strict; use CGI::Carp qw(fatalsToBrowser warningsToBrowser); use CGI; my $cgi = CGI->new(); print $cgi->header(); warningsToBrowser(1); # use warnings; use LWP::RobotUA; use HTML::LinkExtor; use HTML::TokeParser; use URI::URL; use Data::Dumper; # for show and troubleshooting my $url = "http://www.mysite.com/forums/"; my $lp = HTML::LinkExtor->new(\&wanted_links); my $ua = LWP::RobotUA->new('my-robot/0.1', '[EMAIL PROTECTED]'); my $lp = HTML::LinkExtor->new(\&wanted_links); print "Content-type: text/html\n\n"; print "Surfer variablen ua PRINT: $ua \n"; print "Surfer variablen lp PRINT: $lp \n"; my @links; get_threads($url); foreach my $page (@links) { # this loops over each link collected from the index my $r = $ua->get($page); if ($r->is_success) { my $stream = HTML::TokeParser->new(\$r->content) or die "Parse error in $page: $!"; # just printing what was collected print Dumper get_thread($stream); print "Content-type: text/html\n\n"; print "surfer variablen stream PRINT: $stream \n"; } else { warn $r->status_line; } } sub get_thread { my $p = shift; my ($title, $name, @thread); while (my $tag = $p->get_tag('a','span')) { if (exists $tag->[1]{'class'}) { if ($tag->[0] eq 'span') { if ($tag->[1]{'class'} eq 'name') { $name = $p->get_trimmed_text('/span'); } elsif ($tag->[1]{'class'} eq 'postbody') { my $post = $p->get_trimmed_text('/span'); push @thread, {'name'=>$name, 'post'=>$post}; } } else { if ($tag->[1]{'class'} eq 'maintitle') { $title = $p->get_trimmed_text('/a'); } } } } return {'title'=>$title, 'thread'=>[EMAIL PROTECTED]; } sub get_threads { my $page = shift; my $r = $ua->request(HTTP::Request->new(GET => $url), sub {$lp->parse($_[0])}); # Expand URLs to absolute ones my $base = $r->base; return [map { $_ = url($_, $base)->abs; } @links]; } sub wanted_links { my($tag, %attr) = @_; return unless exists $attr{'href'}; return if $attr{'href'} !~ /^viewtopic\.php\?t=/; push @links, values %attr; } _______________________________________________________________________ EINE FÜR ALLE: die kostenlose WEB.DE-Plattform für Freunde und Deine Homepage mit eigenem Namen. Jetzt starten! http://unddu.de/[EMAIL PROTECTED] -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/