Hi, guys. I want to scrap an HTML site which is using javascript to generate the contents. So, I can't use mechanize gem or similar ones. I've tried rdom and taka with johnson, but still some problems (I could give you more details). The best and easiest option I have at the moment is to use watir (or selenium or celerity for jruby). I've selected watir, it's simple, the watir gem or even the watir-webdriver gem. I like them. But I have two problems:
- I want to deploy the app in heroku but I get the error: "Could not find Firefox binary (os=linux)". - I don't know if it's possible to access to the watir logic without the need of the browser binary (and without open it in background). I currently have an answer here: http://stackoverflow.com/questions/3597118/can-you-deploy-watir-on-heroku-to-generate-html-snapshots-if-so-how, but I just wanted to confirm the options I have. I write a watir-webdriver example, working well in local, to ilustrate the simple process (in this case html is not dynamically generated, of course, it's only an example): require "rubygems" require "watir-webdriver" require "watir-webdriver/extensions/wait" browser = Watir::Browser.new :firefox browser.goto "http://google.com" browser.text_field(:name, 'q').set "watir-webdriver" browser.button(:name, 'btnG').click Maybe the only option I have is to use EC2, but it's a pitty because I only need to scrap javascript-generated HTML and I want to keep on using heroku, I love it!!! What do you think is the best gem for me to do it on heroku? Or there's no option and I have to use EC2 just to open a browser, losing the heroku goodness? Thanks in advance -- You received this message because you are subscribed to the Google Groups "Heroku" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/heroku?hl=en.
