On Wed, 5 Apr 2006, Peter Stevens wrote: > ... and until someone writes a Javascript interpreter for > Perl or a Mechanize clone to control Firefox, there will be no > general solution.
That's actually not quite accurate. There *is* a JavaScript interpreter for perl (JavaScript::SpiderMonkey) on CPAN. The problem is not the interpreter. JavaScript::SpiderMonkey lets you run arbitrary JavaScript code in Perl, pass parameters from perl to JavaScript and call back into perl from JavaScript. The problem is the browser DOM, which a browser's JavaScript interpreter has pre-loaded. The different HTML parts are pre-loaded JavaScript objects and methods like OnClick() are predefined. As soon as someone gets going and comes up with a reference implementation (every browser naturally has its own DOM implementation, that's why IE and Firefox behave differently at times), WWW::Mech is in business. How cool would that be! -- Mike Mike Schilli [EMAIL PROTECTED] > But if you want to scrape specific pages, then a > solution is always possible. > > One typical use of Javascript is to perform argument checking before > posting to the server. The URL you want is probably just buried in > the Javascript function. Do a regular expression match on > |$mech->content()| to find the link that you want and |$mech->get| > it directly (this assumes that you know what your are looking for in > advance). > > In more difficult cases, the Javascript is used for URL mangling to > satisfy the needs of some middleware. In this case you need to > figure out what the Javascript is doing (why are these URLs always > really long?). There is probably some function with one or more > arguments which calculates the new URL. Step one: using your > favorite browser, get the before and after URLs and save them to > files. Edit each file, converting the the argument separators ('?', > '&' or ';') into newlines. Now it is easy to use diff or comm to > find out what Javascript did to the URL. Step 2 - find the function > call which created the URL - you will need to parse and interpret > its argument list. Using the Javascript Debugger Extension for > Firefox may help with the analysis. At this point, it is fairly > trivial to write your own function which emulates the Javascript for > the pages you want to process. > > Please append to it: > > An Alternative Approach (this is also an answer to the question, "It > works in Firefox, why not in $mech?" ) > > Everything the web server knows about the client is present in the > HTTP request. If two requests are identical, the results should be > identical. So the real question is "What is different between the > mech request and the Firefox request?" > > I would suggest using the Firefox extension "Tamper Data" to look at > the headers of the requests you send to the server. Compare that > with what LWP is sending. Once the two are identical, the action of > the server should be the same as well. > > I say "should", because this is an oversimplification - some values > are naturally unique, e.g. a SessionID, but if a SessionID is > present, that is probably sufficient, even though the value will be > different between the LWP request and the Firefox request. The > server could use the session to store information which is > troublesome, but that's not the first place to look (and highly > unlike to be relevant when you are requesting the login page of your > site). > > Generally the problem is to be found in missing or incorrect > POSTDATA arguments, Cookies, User-Agents, Accepts, etc. If you are > using mech, then redirects and cookies should not be a problem, but > are listed here for completeness. If you are missing headers, > $mech->add_header can be used to add the headers that you need. > > Is there a preferred way to get the request which mech is going to send? > I was able to get it by following the code into the innards of > HTTP::Request, but that seems like the kind of stuff a $mechanize user > won't want to do. > > Cheers, > > Peter > > > > Cahoon, Forrest wrote: > > If you're specifically looking at Yahoo! Mail, there's at least one CPAN > > module for that: > > http://search.cpan.org/~johnsca/MailClientYahoo-1.0/lib/Mail/Client/Yahoo.pm > > > > If it's just something similar to Yahoo!, perhaps that code will give you > > some clues. > > (I haven't used that module myself, just happened to notice it's existence.) > > > > Forrest > > not speaking for merrill corporation > > > > > >> -----Original Message----- > >> From: Roy Lor [mailto:[EMAIL PROTECTED] > >> Sent: Tuesday, April 04, 2006 8:21 AM > >> To: libwww@perl.org > >> Subject: WWW::Mechanize > >> > >> can u give me a code/script that records the information in a > >> log-in form with javascript..like that of mail.yahoo.com? i > >> badly need this..thanks > >> > >> > >> --------------------------------- > >> On Yahoo!7 > >> Desperate Housewives: Sneak peeks, recaps and more. > >> > >> > > > > > > >