the lynx idea is a good one.  Let me give that a try.  I already tried spoofing 
the IE agent however, the javascript does catch it.
D.

Dave Bour
Desktop Solution Center
905.381.0077 X501
[EMAIL PROTECTED]

For people who just want IT to work

Business http://www.desktopsolutioncenter.ca
Personal http://www.davebour.com

From: Mike Ashton [mailto:[EMAIL PROTECTED]
Sent: Friday, April 04, 2008 11:16 AM
To: TAUG Asterisk Mailing List
Subject: Re: [on-asterisk] Web Scraper Routines

Dave,

We had a client site we were trying to scrape, and due to the complex use of 
javascript we could not get it to work using LWP or WWW:Mech. A lot of these 
sites are trying to thwart scraping and the main way is to use javascript to do 
it. We got it to work by using VB to do it with hooks into the IE classes and 
it worked pretty well, just a pain to have a dedicated windows machine doing 
the scraping.

Before going down this is to look at the page source and see if JS is being 
used. If it is try changing your browsers Useragent to make it look like lynx 
or GoogleBot and see if the page is rendered differently or without the use of 
javascript. Two AddOn tools to do this in Firefox is prefBar, and/or 
UserAgentSwitcher.

Mike

Dave Bour wrote:

turns out that virgin mobile doesn't use https...silly of me to assume...that 
said, javascript causing a "fail to login" using a wget routine when the same 
url works in IE (for once, IE does something expected).  So now...back to perl 
or something that reads javascript and doesn't get snagged.

D.



Dave Bour

Desktop Solution Center

905.381.0077 X501

[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>



For people who just want IT to work



Business http://www.desktopsolutioncenter.ca

Personal http://www.davebour.com







-----Original Message-----

From: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]> [mailto:[EMAIL PROTECTED] On 
Behalf Of Simon

P. Ditner

Sent: Friday, April 04, 2008 9:35 AM

To: Dave Bour

Cc: TAUG Asterisk Mailing List

Subject: Re: [on-asterisk] Web Scraper Routines



If you're familiar with Python, using the libraries urllib2 and

beautifulsoup will get you what you need. For Perl there is

WWW::Mechanize -- both support SSL



On Thu, Apr 3, 2008 at 6:52 PM, Dave Bour

<[EMAIL PROTECTED]><mailto:[EMAIL PROTECTED]> wrote:



I'm curious if anyone has done any web site scrapers to put data on



the sets.  I've got a few routines displaying some data on my Aastra

sets now.  That's the easy part.



 One I'd like to get is my Virgin Mobile prepaid account balance.



Since it's https, it's not as simple as passing the user/password as a

variable.  A little more work would be required.  I could think of a

dozen things like this to use...



 Bank & Credit card balance monitoring for rapid fraud detection

 Any Voip or cellular account prepaid balance - avoid those



embarrassing 0 balance call termination events



 Anything where you'd like to monitor balances, etc....I see use for



this.



 LesNet has provided a simple URL based lookup that will do it for



you:



 http://les.net/api/balance.php?id_account=xxxxx&password=yyyyy



 where xxxxx and yyyyy are your account/password respectively.  A



simple CURL statement makes that data useful.



 You could program your system to update the data daily or triggered



on an activity.



 I remember a couple years ago a similar discussion across the TLUG



group but can't find it any longer.    Any ideas?









 Dave Bour

 Desktop Solution Center

 905.381.0077 X501

 [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>



 For people who just want IT to work



 Business http://www.desktopsolutioncenter.ca

 Personal http://www.davebour.com











--

| It ain't what you don't know that gets you into trouble. It's what

| you know for sure that just ain't so. -- Mark Twain

|

| The Toronto Asterisk Users Group

| Join the discussion group by visiting http://taug.ca





---------------------------------------------------------------------

To unsubscribe, e-mail: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>

For additional commands, e-mail: [EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>









--

Mike Ashton



Quality Track Intl



Ph:    647-722-2092 x 301

Cell:  416-527-4995

Fax:   416-352-6043



QTI CONFIDENTIAL AND PROPRIETARY INFORMATION



The contents of this material are confidential and proprietary to Quality Track 
 International, Inc.

and may not be reproduced, disclosed, distributed or used without the express 
permission of an authorized representative of QTI.

Use for any purpose or in any manner other than that expressly authorized is 
prohibited.

If you have received this communication in error, please immediately delete it 
and all copies, and promptly notify the sender.




Reply via email to