Hi All— A while back I was working on a webcrawler and I realized we didn't have a Julia HTML parser. I also wanted to learn how to wrap C libraries, so I started working on a wrapper around google's gumbo <https://github.com/google/gumbo-parser> library for parsing HTML. The result, Gumbo.jl, can be found here <https://github.com/porterjamesj/Gumbo.jl>. It's by no means production ready but I am reasonably happy with the API and I would love for others to do some tire kicking and send feedback, bug reports, etc.
Major thanks to Tony Kelman for helping me whip the build script into shape on IRC last night. It *should* build correctly on a Unix system with autotools, please file a bug if the build doesn't work for you. Some things that still need doing if anyone wants to help: - support windows. If someone wants to build and test binaries of the gumbo dll, I'm happy to host them and add them to the build script. - support CDATA, just haven't gotten around to it yet. - performance improvements. I am certainly being very wasteful with memory when translating gumbo's output to Julia types. I would also love to get general code review from others who write these sorts of packages; feedback on the API, etc., so please try it out and let me know what you think! Cheers, James
