This may be a good opportunity to introduce our CSS Selector library in 
Julia, Cascadia.jl : https://github.com/Algocircle/Cascadia.jl

The code is based on the Cascadia GO library by Andy Balhom. 

Cascadia.jl uses the Gumbo.jl html parser, and allows querying of the 
resulting parse tree with CSS selectors. That allows you go extract data 
out of html documents with relative ease. There is an example in the 
package that scrapes one page on StackOverflow. 

Regards
-
Avik

On Monday, 1 August 2016 09:46:43 UTC+1, STAR0SS wrote:
>
> I used HTTPClient to get the page and Gumbo to parse it some time ago 
> (near v0.3)
>
> https://github.com/porterjamesj/Gumbo.jl
>
> I was doing things like that, it's probably not the most elegant
> way of doing it, but it was working fine:
>
> function get_hrefs(body::HTMLElement)
>     links = String[]
>     for elem in preorder(body)
>         if typeof(elem) == HTMLElement{:a}
>             try
>                 push!(links,getattr(elem, "href"))
>             catch
>             end
>         end
>     end
>     return links
> end
>

Reply via email to