AsyncResponse body parsed to html cannot findAll a tag

mashingan Sat, 27 May 2017 10:15:05 +0200

I want to scrape some html pages and am thinking to utilize the asynchttpclient 
for it. Below is what I wrote
    
    
    import httpclient, streams, asyncdispatch, htmlparser, xmltree
    
    const totalPage = 1 #
    let theurl = 
"http://learn.shayhowe.com/html-css/organizing-data-with-tables/";
    
    var
      asynclient = newAsyncHttpClient()
      client = newHttpClient()
      pages = newSeq[Future[AsyncResponse]](totalPage)
    
    # synchronous get for comparison
    var
      contentHtml = client.get(theurl).bodyParse.parseHtml
      tbody = contentHtml.findAll("tbody")
    
    # To override ctrl+c for stopping the loop
    proc toquit() {.noconv.} =
      echo "tbody length is ", tbody.len
      echo "tbody is ", $tbody
      asynclient.close
      client.close
      quit QuitSuccess
    
    setControlCHook toquit
    
    # this part for asynchronous get
    for page in 1 .. totalPage:
      pages[page-1] = asynclient.get theurl
      pages[page-1].callback = proc(fres: Future[AsyncResponse]) {.thread.} =
        var
          asyncres = fres.read
          content = waitFor asynres.body
          html = content.newStringStream.parseHtml # there's warning that
                                                   # parseHtml is not GC-safe
          trbody = html.findAll("trbody")  # same as above to get tbody tag
        echo "trbody is ", $trbody
    
    runForever()


What I got was
    
    
    trbody is @[]  # this is result from async
    
    
    # then pressing ctrl+c to stop
    sync tbody length is 29
    

The question is, why I can't get the html/xml tag after parsed it to `XmlNode` 
while I can get the result from synchronous get?

Did I do something wrong for asynchronous call?

AsyncResponse body parsed to html cannot findAll a tag

Reply via email to