Dave,
We had a client site we were trying to scrape, and due to the complex
use of javascript we could not get it to work using LWP or WWW:Mech. A
lot of these sites are trying to thwart scraping and the main way is to
use javascript to do it. We got it to work by using VB to do it with
hooks into the IE classes and it worked pretty well, just a pain to have
a dedicated windows machine doing the scraping.
Before going down this is to look at the page source and see if JS is
being used. If it is try changing your browsers Useragent to make it
look like lynx or GoogleBot and see if the page is rendered differently
or without the use of javascript. Two AddOn tools to do this in Firefox
is prefBar, and/or UserAgentSwitcher.
Mike
Dave Bour wrote:
turns out that virgin mobile doesn't use https...silly of me to assume...that said,
javascript causing a "fail to login" using a wget routine when the same url
works in IE (for once, IE does something expected). So now...back to perl or something
that reads javascript and doesn't get snagged.
D.
Dave Bour
Desktop Solution Center
905.381.0077 X501
[EMAIL PROTECTED]
For people who just want IT to work
Business http://www.desktopsolutioncenter.ca
Personal http://www.davebour.com
-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Simon
P. Ditner
Sent: Friday, April 04, 2008 9:35 AM
To: Dave Bour
Cc: TAUG Asterisk Mailing List
Subject: Re: [on-asterisk] Web Scraper Routines
If you're familiar with Python, using the libraries urllib2 and
beautifulsoup will get you what you need. For Perl there is
WWW::Mechanize -- both support SSL
On Thu, Apr 3, 2008 at 6:52 PM, Dave Bour
<[EMAIL PROTECTED]> wrote:
I'm curious if anyone has done any web site scrapers to put data on
the sets. I've got a few routines displaying some data on my Aastra
sets now. That's the easy part.
One I'd like to get is my Virgin Mobile prepaid account balance.
Since it's https, it's not as simple as passing the user/password as a
variable. A little more work would be required. I could think of a
dozen things like this to use...
Bank & Credit card balance monitoring for rapid fraud detection
Any Voip or cellular account prepaid balance - avoid those
embarrassing 0 balance call termination events
Anything where you'd like to monitor balances, etc....I see use for
this.
LesNet has provided a simple URL based lookup that will do it for
you:
http://les.net/api/balance.php?id_account=xxxxx&password=yyyyy
where xxxxx and yyyyy are your account/password respectively. A
simple CURL statement makes that data useful.
You could program your system to update the data daily or triggered
on an activity.
I remember a couple years ago a similar discussion across the TLUG
group but can't find it any longer. Any ideas?
Dave Bour
Desktop Solution Center
905.381.0077 X501
[EMAIL PROTECTED]
For people who just want IT to work
Business http://www.desktopsolutioncenter.ca
Personal http://www.davebour.com
--
| It ain't what you don't know that gets you into trouble. It's what
| you know for sure that just ain't so. -- Mark Twain
|
| The Toronto Asterisk Users Group
| Join the discussion group by visiting http://taug.ca
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
--
Mike Ashton
Quality Track Intl
Ph: 647-722-2092 x 301
Cell: 416-527-4995
Fax: 416-352-6043
QTI CONFIDENTIAL AND PROPRIETARY INFORMATION
The contents of this material are confidential and proprietary to Quality Track
International, Inc.
and may not be reproduced, disclosed, distributed or used without the express
permission of an authorized representative of QTI.
Use for any purpose or in any manner other than that expressly authorized is
prohibited.
If you have received this communication in error, please immediately delete it
and all copies, and promptly notify the sender.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]