Hello to all, I'm very newbie of Perl but every I'm understanding how is powerful this language, but I have a problem:
I'm using Perl with Selenium for scraping data (for a job) the code looks like this [code] use strict; use warnings; use Time::HiRes qw(sleep); use Test::WWW::Selenium; use Test::More "no_plan"; use Test::Exception; open (INFO, '>>database.csv') or die "$!"; print INFO ("titolo\;descrizione\;schedaTecnica\;URLFoto\n"); my $sel = Test::WWW::Selenium->new( host => "localhost", port => 4444, browser => "*chrome", browser_url => "http://www.example.com/it/page.html" ); sub estrai{ $sel->wait_for_page_to_load_ok("30000"); my $titolo = $sel->get_text("//h1"); my $schedaTecnica = $sel->get_text("//td[3]/ul"); my $img = $sel->get_attribute("//div/img\@src"); my $descrizione = $sel->get_text("//td[2]"); print INFO ("$titolo\;$descrizione\;$schedaTecnica\;$img\n"); $sel->go_back_ok(); $sel->wait_for_page_to_load_ok("30000"); } $sel->open_ok("/it/page.html"); $sel->click_ok("//div[2]/div/div/div[2]/h3/a"); $sel->wait_for_page_to_load_ok("30000"); $sel->click_ok("//div[2]/div/div/div[2]/h3/a"); $sel->wait_for_page_to_load_ok("30000"); estrai($sel); ... close (INFO); [/code] Unfortunately my CSV is very bad because (sometimes) when I extract data from "//ul" my file looks like: [code] Art. S500 Set Yoga "Siddhartha";Idea regalo ?SET YOGA Siddhartha? Elegante scatola in cartone lucido contenente: 2 mattoni in legno naturale mis. cm 20 x 12,5 x 7 1 cinghia in cotone mis. cm 4 x 235 1 stuoia in cotone mis. cm 70 x 170 1 manuale di introduzione allo yoga stampato Tutto rigorosamente realizzato con materiali naturali;€ 82,50;../images/S500%20(Custom).jpg [/code] So when I extract data I need to implement UTF8 encoding and to eliminate spaces between lines, how is possible? Thanks in advance -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/