This may be a good opportunity to introduce our CSS Selector library in Julia, Cascadia.jl : https://github.com/Algocircle/Cascadia.jl
The code is based on the Cascadia GO library by Andy Balhom. Cascadia.jl uses the Gumbo.jl html parser, and allows querying of the resulting parse tree with CSS selectors. That allows you go extract data out of html documents with relative ease. There is an example in the package that scrapes one page on StackOverflow. Regards - Avik On Monday, 1 August 2016 09:46:43 UTC+1, STAR0SS wrote: > > I used HTTPClient to get the page and Gumbo to parse it some time ago > (near v0.3) > > https://github.com/porterjamesj/Gumbo.jl > > I was doing things like that, it's probably not the most elegant > way of doing it, but it was working fine: > > function get_hrefs(body::HTMLElement) > links = String[] > for elem in preorder(body) > if typeof(elem) == HTMLElement{:a} > try > push!(links,getattr(elem, "href")) > catch > end > end > end > return links > end >
