Dave,

We had a client site we were trying to scrape, and due to the complex use of javascript we could not get it to work using LWP or WWW:Mech. A lot of these sites are trying to thwart scraping and the main way is to use javascript to do it. We got it to work by using VB to do it with hooks into the IE classes and it worked pretty well, just a pain to have a dedicated windows machine doing the scraping.

Before going down this is to look at the page source and see if JS is being used. If it is try changing your browsers Useragent to make it look like lynx or GoogleBot and see if the page is rendered differently or without the use of javascript. Two AddOn tools to do this in Firefox is prefBar, and/or UserAgentSwitcher.

Mike

Dave Bour wrote:
turns out that virgin mobile doesn't use https...silly of me to assume...that said, 
javascript causing a "fail to login" using a wget routine when the same url 
works in IE (for once, IE does something expected).  So now...back to perl or something 
that reads javascript and doesn't get snagged.
D.

Dave Bour
Desktop Solution Center
905.381.0077 X501
[EMAIL PROTECTED]

For people who just want IT to work

Business http://www.desktopsolutioncenter.ca
Personal http://www.davebour.com


-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Simon
P. Ditner
Sent: Friday, April 04, 2008 9:35 AM
To: Dave Bour
Cc: TAUG Asterisk Mailing List
Subject: Re: [on-asterisk] Web Scraper Routines

If you're familiar with Python, using the libraries urllib2 and
beautifulsoup will get you what you need. For Perl there is
WWW::Mechanize -- both support SSL

On Thu, Apr 3, 2008 at 6:52 PM, Dave Bour
<[EMAIL PROTECTED]> wrote:
I'm curious if anyone has done any web site scrapers to put data on
the sets.  I've got a few routines displaying some data on my Aastra
sets now.  That's the easy part.
 One I'd like to get is my Virgin Mobile prepaid account balance.
Since it's https, it's not as simple as passing the user/password as a
variable.  A little more work would be required.  I could think of a
dozen things like this to use...
 Bank & Credit card balance monitoring for rapid fraud detection
 Any Voip or cellular account prepaid balance - avoid those
embarrassing 0 balance call termination events
 Anything where you'd like to monitor balances, etc....I see use for
this.
 LesNet has provided a simple URL based lookup that will do it for
you:
 http://les.net/api/balance.php?id_account=xxxxx&password=yyyyy

 where xxxxx and yyyyy are your account/password respectively.  A
simple CURL statement makes that data useful.
 You could program your system to update the data daily or triggered
on an activity.
 I remember a couple years ago a similar discussion across the TLUG
group but can't find it any longer.    Any ideas?


 Dave Bour
 Desktop Solution Center
 905.381.0077 X501
 [EMAIL PROTECTED]

 For people who just want IT to work

 Business http://www.desktopsolutioncenter.ca
 Personal http://www.davebour.com



--
| It ain't what you don't know that gets you into trouble. It's what
| you know for sure that just ain't so. -- Mark Twain
|
| The Toronto Asterisk Users Group
| Join the discussion group by visiting http://taug.ca

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



--
Mike Ashton

Quality Track Intl

Ph:     647-722-2092 x 301
Cell:   416-527-4995
Fax:    416-352-6043

QTI CONFIDENTIAL AND PROPRIETARY INFORMATION

The contents of this material are confidential and proprietary to Quality Track 
 International, Inc.
and may not be reproduced, disclosed, distributed or used without the express 
permission of an authorized representative of QTI.
Use for any purpose or in any manner other than that expressly authorized is 
prohibited.
If you have received this communication in error, please immediately delete it 
and all copies, and promptly notify the sender.



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to