Hello to all,

I'm very newbie of Perl but every I'm understanding how is powerful this 
language, but I have a problem:

I'm using Perl with Selenium for scraping data (for a job) the code looks like 
this 

[code]
use strict;
use warnings;
use Time::HiRes qw(sleep);
use Test::WWW::Selenium;
use Test::More "no_plan";
use Test::Exception;


open (INFO, '>>database.csv') or die "$!";      
print INFO ("titolo\;descrizione\;schedaTecnica\;URLFoto\n");                   
                                                
my $sel = Test::WWW::Selenium->new( host => "localhost", 
                                    port => 4444, 
                                    browser => "*chrome", 
                                    browser_url => 
"http://www.example.com/it/page.html"; );

sub estrai{
        $sel->wait_for_page_to_load_ok("30000");
        my $titolo = $sel->get_text("//h1");
        my $schedaTecnica = $sel->get_text("//td[3]/ul");
        my $img = $sel->get_attribute("//div/img\@src");
        my $descrizione = $sel->get_text("//td[2]");
        print INFO ("$titolo\;$descrizione\;$schedaTecnica\;$img\n");
        $sel->go_back_ok();
        $sel->wait_for_page_to_load_ok("30000");
}
                                                                        
$sel->open_ok("/it/page.html");
$sel->click_ok("//div[2]/div/div/div[2]/h3/a");
$sel->wait_for_page_to_load_ok("30000");
$sel->click_ok("//div[2]/div/div/div[2]/h3/a");
$sel->wait_for_page_to_load_ok("30000");
estrai($sel);
...
close (INFO);
[/code]

Unfortunately my CSV is very bad because (sometimes) when I extract data from 
"//ul" my file looks like:

[code]
Art. S500 Set Yoga "Siddhartha";Idea regalo ?SET YOGA Siddhartha? Elegante 
scatola in cartone lucido contenente:
 2 mattoni in legno naturale mis. cm 20 x 12,5 x 7
 
 1 cinghia in cotone mis. cm 4 x 235
 
 1 stuoia in cotone mis. cm 70 x 170
 
 1 manuale di introduzione allo yoga stampato
 
 
 
 Tutto rigorosamente realizzato con materiali naturali;€ 
82,50;../images/S500%20(Custom).jpg
[/code]
So when I extract data I need to implement UTF8 encoding and to eliminate 
spaces between lines, how is possible?

Thanks in advance


--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/


Reply via email to