Re: [R] URL Scan
On Sun, Apr 17, 2011 at 11:56 PM, jmsc michaelfp...@gmail.com wrote: The site does not require a login/password. Another way to access the first site would be to go to the second site, click Connecticut, click Canterbury, CT, enter the online database, click search under Query by Location with nothing in the search fields, and click the first property. Viewing the frame source on this page redirects to the second site. it doesn't require a login/pass, but it uses session cookies to simulate a logged-in user (there's even a log out button that clears the session). Also, could you direct me to or give me some instructions on scanning from sites that do require a login/password? Thanks. I had a quick look for R-help posts on this ( RSiteSearch(cookies), RSiteSearch(session) etc) but didn't find much. You probably want to install RCurl and look at the examples. Generally what happens is that a successful login, or in this case just visiting the database front page, causes the web server to send back a 'cookie' with a long ID number in it. For every further access to that web site your browser includes the cookie. The server then looks up the ID, goes 'yup, this is a valid session', and sends you the page you want. If the cookie isn't there, or the ID isn't valid (and the ID numbers are big enough to make guessing impractical), then you get the default page. Barry __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] URL Scan
Ok thanks for the suggestion. I will look into that. On Mon, Apr 18, 2011 at 5:27 AM, Barry Rowlingson [via R] ml-node+3457010-2013527485-230...@n4.nabble.com wrote: On Sun, Apr 17, 2011 at 11:56 PM, jmsc [hidden email]http://user/SendEmail.jtp?type=nodenode=3457010i=0by-user=t wrote: The site does not require a login/password. Another way to access the first site would be to go to the second site, click Connecticut, click Canterbury, CT, enter the online database, click search under Query by Location with nothing in the search fields, and click the first property. Viewing the frame source on this page redirects to the second site. it doesn't require a login/pass, but it uses session cookies to simulate a logged-in user (there's even a log out button that clears the session). Also, could you direct me to or give me some instructions on scanning from sites that do require a login/password? Thanks. I had a quick look for R-help posts on this ( RSiteSearch(cookies), RSiteSearch(session) etc) but didn't find much. You probably want to install RCurl and look at the examples. Generally what happens is that a successful login, or in this case just visiting the database front page, causes the web server to send back a 'cookie' with a long ID number in it. For every further access to that web site your browser includes the cookie. The server then looks up the ID, goes 'yup, this is a valid session', and sends you the page you want. If the cookie isn't there, or the ID isn't valid (and the ID numbers are big enough to make guessing impractical), then you get the default page. Barry __ [hidden email]http://user/SendEmail.jtp?type=nodenode=3457010i=1by-user=tmailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- If you reply to this email, your message will be added to the discussion below: http://r.789695.n4.nabble.com/URL-Scan-tp3456084p3457010.html To unsubscribe from URL Scan, click herehttp://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=3456084code=am1zdGF0c29sdXRpb25zQGdtYWlsLmNvbXwzNDU2MDg0fDIwNjIxMjY3NjA=. -- View this message in context: http://r.789695.n4.nabble.com/URL-Scan-tp3456084p3458336.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] URL Scan
On Sun, Apr 17, 2011 at 9:40 PM, jmsc michaelfp...@gmail.com wrote: I am wondering why when I try to input data from the first site listed below into R using the scan() function, a different page is read in instead (the second site listed): http://data.visionappraisal.com/CanterburyCT/parcel.asp?pid=1242 http://www.visionappraisal.com/databases/ I am wondering if this is an issue with R or something in the source code of the web page that I am not familiar with. Since I can access the first site directly, I assume it is not within the source code. Any help would be appreciated. I can't access the first URL directly - even from my web browser without R being involved at all. Is that pid a parcel ID that you need to be logged in to see? Or not a valid parcel id anymore? If you want to access a web site from R that needs a login/password then you need to send the appropriate login form info from R and keep the cookie session info that gets returned. Web sessions from R and from a web browser are independent. Barry __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] URL Scan
The site does not require a login/password. Another way to access the first site would be to go to the second site, click Connecticut, click Canterbury, CT, enter the online database, click search under Query by Location with nothing in the search fields, and click the first property. Viewing the frame source on this page redirects to the second site. Also, could you direct me to or give me some instructions on scanning from sites that do require a login/password? Thanks. On Sun, Apr 17, 2011 at 6:33 PM, Barry Rowlingson [via R] ml-node+3456231-1127256797-230...@n4.nabble.com wrote: On Sun, Apr 17, 2011 at 9:40 PM, jmsc [hidden email]http://user/SendEmail.jtp?type=nodenode=3456231i=0by-user=t wrote: I am wondering why when I try to input data from the first site listed below into R using the scan() function, a different page is read in instead (the second site listed): http://data.visionappraisal.com/CanterburyCT/parcel.asp?pid=1242 http://www.visionappraisal.com/databases/ I am wondering if this is an issue with R or something in the source code of the web page that I am not familiar with. Since I can access the first site directly, I assume it is not within the source code. Any help would be appreciated. I can't access the first URL directly - even from my web browser without R being involved at all. Is that pid a parcel ID that you need to be logged in to see? Or not a valid parcel id anymore? If you want to access a web site from R that needs a login/password then you need to send the appropriate login form info from R and keep the cookie session info that gets returned. Web sessions from R and from a web browser are independent. Barry __ [hidden email]http://user/SendEmail.jtp?type=nodenode=3456231i=1by-user=tmailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- If you reply to this email, your message will be added to the discussion below: http://r.789695.n4.nabble.com/URL-Scan-tp3456084p3456231.html To unsubscribe from URL Scan, click herehttp://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=3456084code=bWljaGFlbGZwYWdlQGdtYWlsLmNvbXwzNDU2MDg0fC04NTEyNDQyOTE=. -- View this message in context: http://r.789695.n4.nabble.com/URL-Scan-tp3456084p3456257.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.