Re: [R] URL Scan

2011-04-18 Thread Barry Rowlingson
On Sun, Apr 17, 2011 at 11:56 PM, jmsc michaelfp...@gmail.com wrote:
 The site does not require a login/password. Another way to access the first
 site would be to go to the second site, click Connecticut, click Canterbury,
 CT, enter the online database, click search under Query by Location with
 nothing in the search fields, and click the first property. Viewing the
 frame source on this page redirects to the second site.

 it doesn't require a login/pass, but it uses session cookies to
simulate a logged-in user (there's even a log out button that clears
the session).

 Also, could you direct me to or give me some instructions on scanning from
 sites that do require a login/password? Thanks.

 I had a quick look for R-help posts on this ( RSiteSearch(cookies),
RSiteSearch(session) etc) but didn't find much. You probably want to
install  RCurl and look at the examples.

 Generally what happens is that a successful login, or in this case
just visiting the database front page, causes the web server to send
back a 'cookie' with a long ID number in it. For every further access
to that web site your browser includes the cookie. The server then
looks up the ID, goes 'yup, this is a valid session', and sends you
the page you want. If the cookie isn't there, or the ID isn't valid
(and the ID numbers are big enough to make guessing impractical), then
you get the default page.

Barry

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] URL Scan

2011-04-18 Thread jmsc
Ok thanks for the suggestion. I will look into that.

On Mon, Apr 18, 2011 at 5:27 AM, Barry Rowlingson [via R] 
ml-node+3457010-2013527485-230...@n4.nabble.com wrote:

 On Sun, Apr 17, 2011 at 11:56 PM, jmsc [hidden 
 email]http://user/SendEmail.jtp?type=nodenode=3457010i=0by-user=t
 wrote:
  The site does not require a login/password. Another way to access the
 first
  site would be to go to the second site, click Connecticut, click
 Canterbury,
  CT, enter the online database, click search under Query by Location with
  nothing in the search fields, and click the first property. Viewing the
  frame source on this page redirects to the second site.

  it doesn't require a login/pass, but it uses session cookies to
 simulate a logged-in user (there's even a log out button that clears
 the session).

  Also, could you direct me to or give me some instructions on scanning
 from
  sites that do require a login/password? Thanks.

  I had a quick look for R-help posts on this ( RSiteSearch(cookies),
 RSiteSearch(session) etc) but didn't find much. You probably want to
 install  RCurl and look at the examples.

  Generally what happens is that a successful login, or in this case
 just visiting the database front page, causes the web server to send
 back a 'cookie' with a long ID number in it. For every further access
 to that web site your browser includes the cookie. The server then
 looks up the ID, goes 'yup, this is a valid session', and sends you
 the page you want. If the cookie isn't there, or the ID isn't valid
 (and the ID numbers are big enough to make guessing impractical), then
 you get the default page.

 Barry

 __
 [hidden 
 email]http://user/SendEmail.jtp?type=nodenode=3457010i=1by-user=tmailing 
 list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 --
  If you reply to this email, your message will be added to the discussion
 below:
 http://r.789695.n4.nabble.com/URL-Scan-tp3456084p3457010.html
  To unsubscribe from URL Scan, click 
 herehttp://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=3456084code=am1zdGF0c29sdXRpb25zQGdtYWlsLmNvbXwzNDU2MDg0fDIwNjIxMjY3NjA=.




--
View this message in context: 
http://r.789695.n4.nabble.com/URL-Scan-tp3456084p3458336.html
Sent from the R help mailing list archive at Nabble.com.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] URL Scan

2011-04-17 Thread Barry Rowlingson
On Sun, Apr 17, 2011 at 9:40 PM, jmsc michaelfp...@gmail.com wrote:
 I am wondering why when I try to input data from the first site listed below
 into R using the scan() function, a different page is read in instead (the
 second site listed):

 http://data.visionappraisal.com/CanterburyCT/parcel.asp?pid=1242

 http://www.visionappraisal.com/databases/

 I am wondering if this is an issue with R or something in the source code of
 the web page that I am not familiar with. Since I can access the first site
 directly, I assume it is not within the source code. Any help would be
 appreciated.

 I can't access the first URL directly - even from my web browser
without R being involved at all. Is that pid a parcel ID that you
need to be logged in to see? Or not a valid parcel id anymore?

 If you want to access a web site from R that needs a login/password
then you need to send the appropriate login form info from R and keep
the cookie session info that gets returned. Web sessions from R and
from a web browser are independent.

Barry

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] URL Scan

2011-04-17 Thread jmsc
The site does not require a login/password. Another way to access the first
site would be to go to the second site, click Connecticut, click Canterbury,
CT, enter the online database, click search under Query by Location with
nothing in the search fields, and click the first property. Viewing the
frame source on this page redirects to the second site.

Also, could you direct me to or give me some instructions on scanning from
sites that do require a login/password? Thanks.

On Sun, Apr 17, 2011 at 6:33 PM, Barry Rowlingson [via R] 
ml-node+3456231-1127256797-230...@n4.nabble.com wrote:

 On Sun, Apr 17, 2011 at 9:40 PM, jmsc [hidden 
 email]http://user/SendEmail.jtp?type=nodenode=3456231i=0by-user=t
 wrote:

  I am wondering why when I try to input data from the first site listed
 below
  into R using the scan() function, a different page is read in instead
 (the
  second site listed):
 
  http://data.visionappraisal.com/CanterburyCT/parcel.asp?pid=1242
 
  http://www.visionappraisal.com/databases/
 
  I am wondering if this is an issue with R or something in the source code
 of
  the web page that I am not familiar with. Since I can access the first
 site
  directly, I assume it is not within the source code. Any help would be
  appreciated.

  I can't access the first URL directly - even from my web browser
 without R being involved at all. Is that pid a parcel ID that you
 need to be logged in to see? Or not a valid parcel id anymore?

  If you want to access a web site from R that needs a login/password
 then you need to send the appropriate login form info from R and keep
 the cookie session info that gets returned. Web sessions from R and
 from a web browser are independent.

 Barry

 __
 [hidden 
 email]http://user/SendEmail.jtp?type=nodenode=3456231i=1by-user=tmailing 
 list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 --
  If you reply to this email, your message will be added to the discussion
 below:
 http://r.789695.n4.nabble.com/URL-Scan-tp3456084p3456231.html
  To unsubscribe from URL Scan, click 
 herehttp://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=3456084code=bWljaGFlbGZwYWdlQGdtYWlsLmNvbXwzNDU2MDg0fC04NTEyNDQyOTE=.




--
View this message in context: 
http://r.789695.n4.nabble.com/URL-Scan-tp3456084p3456257.html
Sent from the R help mailing list archive at Nabble.com.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.