Ryan Perry wrote:

On Jun 16, 2006, at 2:38 PM, Peter Stevens wrote:

Yes and no. AJAX is pretty heavily into Java script - you'll be spending a lot of time with Tamper Data to figure out what data is being sent back and forth. You can look at and reverse engineer the java script.

Can it be done? Certainly. Is it easy? No. Is it practical? Well that depends on the complexity of the javascript environment and how much time & energy you have for the problem.

We need a Javascript + DOM implementation on top of Mechanize!

I strongly agree. It's the future and we need to be there. How would one go about this? I'd be happy to contribute. Where do I start?

Thanks!
Well, this may get me fried, but I do feel compelled to comment because my day to day job is to create code to do web scraping on some extremely complex web sites, many of which use a lot of AJAX and DHTML (Javascript and *shudder* VBS).

My first web scraper was written in Perl/Mechanize, and we managed to get past the DHTML portions by decoding the Javascript and returning responses that took into account the Javascript that *would* have run. It ran, but took over 3 months to write, and THAT site was well written, with nice ID's and no AJAX. I still had to assume a few page locations and do blind posts because of complex Javascript that did page redirects to pages who's Javascript did MORE page re-directs.

After doing some research on upcoming projects scraping sites using AJAX and more complex Javascript I found a tool called Watir (http://openqa.org/watir/). Watir is a library written in Ruby (which we were already starting to use) and gets around the whole Javascript/AJAX issues by automating use of Internet Explorer so all scripts actually run. While having to run it under Windows and only having support for IE was the downside, the upside has been that our scrapers have been much easier to write, and the last one I did took about two weeks, and it does a lot more than our first one.

One thing that still gave us problems until recently was the issue of pop-up windows. While Watir had a rather crude way of clicking past a pop-up window if you knew it was coming, modal dialogs were still hard to automate because they can have any valid HTML, but the IE "click" would block until the dialog closed, and there was no way to attach to the modal dialog and get access to the DOM if you did the click in another thread/process. Finally someone put together an intricate method to attach to a modal dialog window by using the current IE's HWND, and then link the pointers together to get access to the modal dialog's DOM.

Now, I can automate a modal dialog window as easily as a normal browser window. Here's the code to fire up IE to a page, click a button which brings up a modal dialog, attach to that dialog, fill a text box on the dialog, click on the dialog close box, and retrieve the entered value from the original window. All in the following few lines of Ruby/Watir code:

   require 'watir'
   include Watir

   ie = IE.new
   ie.goto('http://SITE/modal_dialog_launcher.html')
   ie.button(:value, 'Launch Dialog').click_no_wait
   ie.modal_dialog.text_field(:name, 'modal_text').set('hello')
   ie.modal_dialog.button(:value, 'Close').click
   modal_text = ie.text_field(:name, 'modaloutput').value
   ie.close

That is code I just executed in Ruby's interactive shell (IRB) on one of the HTML files in the Watir unit test suite.

Now, to get that functionality you need to check out the latest developer versions using SVN as I just added the modal_dialog functionality recently, but it does work. (I just put together the pieces I found in a number of places to get that functionality, so I can't take much credit, but I did submit it to the Watir project.)

There is a project called FireWatir which is aimed at using Firefox (under any O/S that Firefox runs under), but it's still lagging a bit behind, and performance is still very poor from what I've heard. But there is hope.

For now, I'd recommend checking out Watir for your web automation projects, if you can get away with using IE under Windows.

David Schmidt
[EMAIL PROTECTED]

Reply via email to