Setting the user agent did the trick, at least in my case.

(ns google-search
  (:import [java.net URL URLEncoder]))
 
(def google-search-url "http://www.google.com/search?q=";)
(def user-agent "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.22 (KHTML, 
like Gecko) Chrome/25.0.1364.172")
 
(defn open-connection [url]
  (doto (.openConnection url)
        (.setRequestProperty "User-Agent" user-agent)))
 
(defn get-response [url]
  (let [conn (open-connection url)
        in   (.getInputStream conn)
        sb   (StringBuilder.)]
    (loop [c (.read in)]
      (if (neg? c)
        (str sb)
        (do 
          (.append sb (char c))
          (recur (.read in)))))))
 
(defn search [query]
  (let [url (URL. (str google-search-url (URLEncoder/encode query)))]
    (get-response url)))
    (spit "response.html" (search "URLEncoder java 7"))


HIH,


Juan

On Friday, March 22, 2013 4:32:33 AM UTC-3, Cedric Greevey wrote:
>
> Change your code to it spoofs a common browser user-agent, change your 
> DHCP-assigned IP address, and try again. They're probably trying to 
> obstruct bots from making overwhelming numbers of requests or something. As 
> long as you don't flood them with requests at a higher rate than a human 
> would generate by clicking, I don't see any ethical issue with 
> circumventing their countermeasures, especially not if the search will be 
> triggered by a user input to your application anyway.
>
>
> On Fri, Mar 22, 2013 at 3:09 AM, Rich Morin <r...@cfcl.com 
> <javascript:>>wrote:
>
>> I've been successfully using slurp and laser to harvest and pull
>> apart some web pages.  However, I can't figure out how to use
>> Google Search from my code.
>>
>> My first thought was to use the Google Search API, but after
>> a lot of frustration in trying to get and use an API key, I
>> gave up on that.
>>
>> My next thought was to slurp in a page from the interactive
>> Google Search facility, using the URL from Advanced Search:
>>
>>   "http://www.google.com/search?hl=en&as_q=...";
>>
>> However, this gives me a 403 nastygram:
>>
>>   IOException Server returned HTTP response code: 403 for URL:
>>   https://www.google.com/search?hl=en&as_q=&as_epq=...
>>   sun.net.www.protocol.http.HttpURLConnection.getInputStream
>>   (HttpURLConnection.java:1436)
>>
>> Has anyone here, by chance, been able to do this sort of thing?
>>
>> -r
>>
>>  --
>> http://www.cfcl.com/rdm            Rich Morin
>> http://www.cfcl.com/rdm/resume     r...@cfcl.com <javascript:>
>> http://www.cfcl.com/rdm/weblog     +1 650-873-7841
>>
>> Software system design, development, and documentation
>>
>>
>> --
>> --
>> You received this message because you are subscribed to the Google
>> Groups "Clojure" group.
>> To post to this group, send email to clo...@googlegroups.com<javascript:>
>> Note that posts from new members are moderated - please be patient with 
>> your first post.
>> To unsubscribe from this group, send email to
>> clojure+u...@googlegroups.com <javascript:>
>> For more options, visit this group at
>> http://groups.google.com/group/clojure?hl=en
>> ---
>> You received this message because you are subscribed to the Google Groups 
>> "Clojure" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to clojure+u...@googlegroups.com <javascript:>.
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>>
>>
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to