I use phantomJs to login/scrape data from a site, then I put that data in a db.
It runs a full browser, with javascript events, ajax, etc. 1. login to a secure site, navigate to a list of orders 2. pull down data for each order and write it to a file 3. process each file and write to the database. I use phantom for step 1 and step 2, and have multiple workers executing at the same time (because of I/O and network delays) I have a separate ruby process that picks up the new files and does the processing. (and I can fire up multiple workers of this type if I need to) CasperJs <http://casperjs.org/> is a nice addition that can make writing your scripts even easier On Tue, Oct 15, 2013 at 10:52 AM, Jack R-G <[email protected]> wrote: > PhantomJS looks interesting. Can it 1) Read/write a Postgres database, > and 2) Fill in form fields and submit forms, with the javascript associated > with the various events firing automatically. I looked at the PhantomJS > website but didn't find any examples of these uses; it would be helpful if > you could point me at examples. > > If PhantomJS cannot access Postgres, is there a typical usage pattern for > PhantomJS code that uses it as a helper to retrieve pages and return them > to the caller for further processing? > > > On Monday, October 14, 2013 11:38:07 AM UTC-7, j_McCaffrey wrote: > >> I can't address your specific situation, but can recommend phantomjs >> instead >> >> https://github.com/stomita/**heroku-buildpack-phantomjs<https://github.com/stomita/heroku-buildpack-phantomjs> >> >> >> On Mon, Oct 14, 2013 at 1:12 PM, Jack Royal-Gordon <[email protected]>wrote: >> >>> I'm trying to scrape some websites that rely on Javascript, so I found this >>> article<http://stackoverflow.com/questions/11494994/is-it-possible-to-plug-a-javascript-engine-with-ruby-and-nokogiri> >>> discussing >>> Watir and headless processing. I ran into the following >>> exception: Headless::**Exception: Xvfb not found on your system. So I >>> started researching Xvfb and discovered that it is a stand-alone display >>> server. So I started looking into how to get it installed on Heroku, >>> I found a gist <https://gist.github.com/atduskgreg/5100799> where the >>> author is discussing building a static linked binary, but he doesn't really >>> come to a successful conclusion. I've also seen mention of using a custom >>> build-pack, but nothing definite there either. >>> >>> Does anyone have experience with this and can offer some advice on how >>> to proceed? >>> >>> -- >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "Heroku" group. >>> >>> To unsubscribe from this group, send email to >>> heroku+un...@**googlegroups.com >>> >>> For more options, visit this group at >>> http://groups.google.com/**group/heroku?hl=en_US?hl=en<http://groups.google.com/group/heroku?hl=en_US?hl=en> >>> >>> --- >>> You received this message because you are subscribed to the Google >>> Groups "Heroku Community" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to heroku+un...@**googlegroups.com. >>> >>> For more options, visit >>> https://groups.google.com/**groups/opt_out<https://groups.google.com/groups/opt_out> >>> . >>> >> >> >> >> -- >> Thanks, >> -John >> > -- > -- > You received this message because you are subscribed to the Google > Groups "Heroku" group. > > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/heroku?hl=en_US?hl=en > > --- > You received this message because you are subscribed to the Google Groups > "Heroku Community" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/groups/opt_out. > -- Thanks, -John -- -- You received this message because you are subscribed to the Google Groups "Heroku" group. To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/heroku?hl=en_US?hl=en --- You received this message because you are subscribed to the Google Groups "Heroku Community" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.
