I want to scrape some html pages and am thinking to utilize the asynchttpclient
for it. Below is what I wrote
import httpclient, streams, asyncdispatch, htmlparser, xmltree
const totalPage = 1 #
let theurl =
"http://learn.shayhowe.com/html-css/organizing-data-with-tables/"
var
asynclient = newAsyncHttpClient()
client = newHttpClient()
pages = newSeq[Future[AsyncResponse]](totalPage)
# synchronous get for comparison
var
contentHtml = client.get(theurl).bodyParse.parseHtml
tbody = contentHtml.findAll("tbody")
# To override ctrl+c for stopping the loop
proc toquit() {.noconv.} =
echo "tbody length is ", tbody.len
echo "tbody is ", $tbody
asynclient.close
client.close
quit QuitSuccess
setControlCHook toquit
# this part for asynchronous get
for page in 1 .. totalPage:
pages[page-1] = asynclient.get theurl
pages[page-1].callback = proc(fres: Future[AsyncResponse]) {.thread.} =
var
asyncres = fres.read
content = waitFor asynres.body
html = content.newStringStream.parseHtml # there's warning that
# parseHtml is not GC-safe
trbody = html.findAll("trbody") # same as above to get tbody tag
echo "trbody is ", $trbody
runForever()
What I got was
trbody is @[] # this is result from async
# then pressing ctrl+c to stop
sync tbody length is 29
The question is, why I can't get the html/xml tag after parsed it to `XmlNode`
while I can get the result from synchronous get?
Did I do something wrong for asynchronous call?