Hello there,
I try to fetch the content of the page
http://www.pluendermeister.de/sec/300300/?curServer=Aegwynn&gname=&submit=Daten+absenden
with LWP and try to parse it. But when I print the result of my parsing
it is not readable.
Here is the result I get:
"teh D\x{f6}nertiere",
'Tempest',
'Templer',
'Templer der Allianz',
'Tempus Irae',
'Tenacious D',
'Tetragammatron Syndikat',
'Thanks for Dying',
'Tharaka',
'The Blood Ruby Dream',
'The Blues Brothers',
'The Brownies',
'The cake is a lie',
'The Eyes of Azeroth',
'The Fear',
'The Game',
'The Last Warriors',
'The Scarlet Crusade',
'The Sentinel',
'Theatre of Pain',
'THEY CALL IT MADNESS',
'THIS RLY COOL OK',
'this space is for rent',
'ThoRs GaRde',
'Thunderstruck',
'TooL',
'TopiC',
'TopiC Reloaded',
'Treasure',
'trilium',
'trillennium',
'Trinitas',
'try again',
'Tuatha de Dannan',
'UC Elevator Victims',
'Ugly',
'Ultimate Sacrifice',
'ultionis sanguinis',
'Umbra Et Imago',
'Underworld Inc',
"Unforg\x{ed}ven",
'unisex',
'United Twinks',
'Unsterblich',
'Vendetta',
'Venom',
'Vergebung',
"Verm\x{e4}chtnis des Blutes",
'Verteidiger des Lichtes',
'Vespo',
"Vielfra\x{df}e",
'Viribus Coniunctis',
'Viribus Unitis',
'Viribus-Unitis',
'Virtus Guards',
'Vis Natura',
'Vision of Escaflown',
'volcomenstoned',
"Volksfront von Jud\x{e4}a",
'Voodoo Lounge',
'VORSICHT BISSIG',
"W\x{e4}chter der Innbr\x{fc}cke",
"W\x{e4}chter des Syndikats",
"W\x{e4}chterWotans",
"Waffenbr\x{fc}der",
'Waidmanns Heil',
'Walhalla Inc',
'Wambo',
"w\x{e8}\x{e9}d is my cheat",
"Wei\x{df}e Brigade",
'Werwolf',
'Werwolf reloaded',
'Wilddragon',
'Will nur Kuscheln',
'Wir halten zusammen',
'WiR RoxXoRn Du BaNaNe',
'Woipatinga Jaga',
'Wow Error',
'You Never Know',
'Zerg',
'Zirkel der Verdammten',
'Zocken gern beim Stefan',
''
];
(only a little snippet for example)
Here is the code I have:
package
DragonNet::Meine::Gute::Gilde::Script::UpdateGuilds::Driver::Pluendermeister;
use strict;
use warnings;
use Moose;
use LWP::UserAgent;
use Data::Dumper;
use Encode;
use namespace::autoclean;
sub _trim {
my ($string) = @_;
$string =~ s/^\s+//;
$string =~ s/\s+$//;
return $string;
}
sub get_guilds {
my ($class, $realm, $script) = @_;
my $realm_name = $realm->name;
my $ua = LWP::UserAgent->new();
$ua->agent('Mozilla/5.0 (Windows; U; Windows NT 6.1; de; rv:1.9.1.7)
Gecko/20091221 Firefox/3.5.7 (.NET CLR 3.5.30729)');
my $url =
"http://www.pluendermeister.de/sec/300300/?curServer=$realm_name&gname=&submit=Daten+absenden";
$script->log( message => "Fetching from: $url\n" );
my $request = HTTP::Request->new( GET => $url );
my $response = $ua->request($request);
if ( !($response->is_success) ) {
$script->log( code => $response->code, message => "Failed
retrieving guild list for realm $realm_name: " . $response->message );
return;
}
my $page_content = $response->decoded_content();
if ($page_content =~ m/ist auf keinen WoW Server gefunden worden./im) {
$script->log( message => "Failed retrieving guild list for realm
$realm_name: realm not found!" );
return;
}
my @guilds = $page_content =~ m@<td>\d*?</td><td><a
href="http://www\.youloot\.de/sec/300000/module/itemstrength/index\.cfm\?g=.*?&r=.*?">(.*?)</a></td><td></td><td></td>@gi;
@guilds = map { _trim($_) } @guilds;
print Dumper(\...@guilds);
die();
return @guilds;
}
__PACKAGE__->meta->make_immutable;
1;
I have tried anything I found on the web. Decode and Encode, with Encode
module or utf8 module. Nothing works. The goal is to save the data into
a database (in a varchar column). But in the database it is not readable
too (but no use of the hexcode of the non-printable characters).
Has somebody a idea what is wrong with my code?
Here the headers from the request (dumped with LiveHTTPHeaders from Firefox)
http://www.pluendermeister.de/sec/300300/?curServer=Aegwynn&gname=&submit=Daten+absenden
GET /sec/300300/?curServer=Aegwynn&gname=&submit=Daten+absenden HTTP/1.1
Host: www.pluendermeister.de
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; de; rv:1.9.1.7)
Gecko/20091221 Firefox/3.5.7 (.NET CLR 3.5.30729)
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: de-de,de;q=0.8,en-us;q=0.5,en;q=0.3
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Referer:
http://www.pluendermeister.de/sec/300300/?curServer=Proudmoore&gname=&submit=Daten+absenden
Cookie: CFID=44934450;
CFTOKEN=29b26d56f86173b6-18348C33-1D92-F5A7-CE76FBDF5616EEA3;
CFCLIENT_300300=j%5Fusername%3DGast%23j%5Fpassword%3Dx%23loginmode%3DSecuritySiteUser%23;
CFGLOBALS=urltoken%3DCFID%23%3D44934450%26CFTOKEN%23%3D29b26d56f86173b6%2D18348C33%2D1D92%2DF5A7%2DCE76FBDF5616EEA3%26jsessionid%23%3D002341C2D5C3E79D9DE086914E0426A3%23lastvisit%3D%7Bts%20%272010%2D01%2D14%2000%3A54%3A42%27%7D%23timecreated%3D%7Bts%20%272009%2D11%2D21%2020%3A25%3A14%27%7D%23hitcount%3D17%23cftoken%3D29b26d56f86173b6%2D18348C33%2D1D92%2DF5A7%2DCE76FBDF5616EEA3%23cfid%3D44934450%23;
__utma=7194099.1561756611.1258831516.1263420397.1263426413.3;
__utmz=7194099.1258831516.1.1.utmcsr=google|utmccn=(organic)|utmcmd=organic|utmctr=%22moonlight%20elite%22%20proudmoore;
JSESSIONID=002341C2D5C3E79D9DE086914E0426A3;
CFAUTHORIZATION_300300="R2FzdDp4OjMwMDMwMA=="; __utmc=7194099;
__utmb=7194099.3.10.1263426413
HTTP/1.x 200 OK
Date: Thu, 14 Jan 2010 00:04:45 GMT
Server: Apache
Set-Cookie:
CFGLOBALS=urltoken%3DCFID%23%3D44934450%26CFTOKEN%23%3D29b26d56f86173b6%2D18348C33%2D1D92%2DF5A7%2DCE76FBDF5616EEA3%26jsessionid%23%3D002341C2D5C3E79D9DE086914E0426A3%23lastvisit%3D%7Bts%20%272010%2D01%2D14%2001%3A04%3A46%27%7D%23timecreated%3D%7Bts%20%272009%2D11%2D21%2020%3A25%3A14%27%7D%23hitcount%3D18%23cftoken%3D29b26d56f86173b6%2D18348C33%2D1D92%2DF5A7%2DCE76FBDF5616EEA3%23cfid%3D44934450%23;
Domain=.pluendermeister.de; Expires=Sat, 07-Jan-2040 00:04:46 GMT; Path=/
Keep-Alive: timeout=10, max=200
Connection: Keep-Alive
Transfer-Encoding: chunked
Content-Type: text/html;charset=UTF-8
Please help me.
Greets
Christoph
--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/